<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to save running time for a large sample regression in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912186#M359630</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I wonder if there is way to save running time for a large sample regression. There are nearly five million observations, and the event rate is about 6%. Currently running logit regression with the full sample takes more than 20 hours. Is there a way to make running faster?&lt;/P&gt;
&lt;P&gt;Thank you for your wisdom!&lt;/P&gt;
&lt;P&gt;L.&lt;/P&gt;</description>
    <pubDate>Fri, 19 Jan 2024 15:16:31 GMT</pubDate>
    <dc:creator>lichee</dc:creator>
    <dc:date>2024-01-19T15:16:31Z</dc:date>
    <item>
      <title>How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912186#M359630</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I wonder if there is way to save running time for a large sample regression. There are nearly five million observations, and the event rate is about 6%. Currently running logit regression with the full sample takes more than 20 hours. Is there a way to make running faster?&lt;/P&gt;
&lt;P&gt;Thank you for your wisdom!&lt;/P&gt;
&lt;P&gt;L.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 15:16:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912186#M359630</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-01-19T15:16:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912187#M359631</link>
      <description>&lt;P&gt;How many X variables, plus interaction terms and power terms, are in the MODEL statement? Are some of the X variables in the CLASS statement? If so, how many variables and how many levels each?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using the SELECTION= option, if so what method did you select?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using a BY statement?&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 15:28:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912187#M359631</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-01-19T15:28:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912213#M359633</link>
      <description>All of the X variables are categorical in the class statement. There are interaction terms of 9 age bands and female gender indicator and 15 other binary indicators. I will also need to include  49 state indicators if I can cut the running time. Thank you very much!</description>
      <pubDate>Fri, 19 Jan 2024 16:52:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912213#M359633</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-01-19T16:52:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912215#M359635</link>
      <description>&lt;P&gt;Having lots of class variables with lots of levels could be one reason why it takes 20 hours. Can you combine some levels (such that maybe you have only 5 age bands instead of 9)?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please answer my other questions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, are you using the EXACT statement in your PROC LOGISTIC?&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 17:10:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912215#M359635</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-01-19T17:10:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912216#M359636</link>
      <description>I can combine age bands. I did not use Exact statement or Selection=Option.</description>
      <pubDate>Fri, 19 Jan 2024 17:17:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912216#M359636</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-01-19T17:17:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912218#M359638</link>
      <description>&lt;P&gt;Please share your PROC LOGISTIC code so we don't have to guess what options are being used.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 17:33:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912218#M359638</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-01-19T17:33:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912219#M359639</link>
      <description>&lt;P&gt;Here is the code:&lt;BR /&gt;proc logistic data=&amp;amp;datin.;&lt;BR /&gt;class &amp;amp;XVAR_ref./PARAM=REF; &lt;BR /&gt;model Dependent(ref='0')=&amp;amp;XVAR./firth parmlabel CLODDS=PL EXPB rsquare;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 17:44:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912219#M359639</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-01-19T17:44:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912221#M359640</link>
      <description>&lt;P&gt;So 17 class variables and you also want to add in state (with 49 levels). I think this is largely the problem and you should experiment with fewer variables and fewer levels of each variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, you will have to struggle with possible multicollinearity between the X variables, which I'm sure will be a problem if you want to interpret the results.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 17:55:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912221#M359640</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-01-19T17:55:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912224#M359642</link>
      <description>I did check multicollinearity for the initial list of 25 class variables and now ended with the 17 class variables without collinearity issue. However, the state indicators were not checked for collinearity as I wanted to include them to control for state variation.</description>
      <pubDate>Fri, 19 Jan 2024 18:07:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912224#M359642</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-01-19T18:07:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912242#M359644</link>
      <description>&lt;P&gt;You might also want to try changing the algorithm convergence criteria in the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_syntax22.htm" target="_self"&gt;MODEL statement&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="PaigeMiller_0-1705691047076.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92772iC9D6DF5C0DE33EEA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="PaigeMiller_0-1705691047076.png" alt="PaigeMiller_0-1705691047076.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 19:04:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912242#M359644</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-01-19T19:04:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912378#M359678</link>
      <description>Try PROC HPLOGISTIC ,&lt;BR /&gt;Any PROC name start with HP is for big data. such as PROC HPGENSELECT, PROC HPMIXED ..........</description>
      <pubDate>Sun, 21 Jan 2024 09:26:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912378#M359678</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-01-21T09:26:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912380#M359679</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Large, complex models are much more likely to suffer from separation problems because the data becomes more sparse as the model becomes more complex.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;So, I understand why you add the FIRTH option.&lt;BR /&gt;But in my experience FIRTH option will also inflate execution time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;(The FIRTH method uses an iterative maximum likelihood estimation algorithm to maximize a penalized likelihood function. The time needed depends on the number of parameters that must be estimated in each iteration and the number of iterations needed to achieve convergence. The amount of time needed will increase with each of these and cannot be known in advance. Note that both of these can be data dependent such that the same code applied to even slightly different data could result in very different time use.)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This usage note discusses the separation issue:&lt;BR /&gt;&lt;/SPAN&gt;&lt;FONT size="4"&gt;Usage Note&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;I&gt;22599:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/I&gt;Understanding and correcting complete or quasi-complete separation problems&lt;BR /&gt;&lt;A href="https://support.sas.com/kb/22/599.html" target="_blank"&gt;https://support.sas.com/kb/22/599.html&lt;/A&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="4"&gt;BR, Koen&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jan 2024 13:33:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912380#M359679</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2024-01-21T13:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to save running time for a large sample regression</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912398#M359688</link>
      <description>&lt;P&gt;1. Are you running on the data set that contains data for all 49 states?&lt;/P&gt;
&lt;P&gt;2. Do you want to run each state's data independently of the others by using a BY statement?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, you might experiment with how long some of the smaller states take. For example, try something equivalent to this:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=&amp;amp;datin.;
where State in ('DE');
class &amp;amp;XVAR_ref./PARAM=REF;
model Dependent(ref='0')=&amp;amp;XVAR./firth parmlabel CLODDS=PL EXPB rsquare;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It might be that the small states complete quickly. Even the larger states (TX, CA, FL) might only take a 20 minutes or less. (Try it out!)&amp;nbsp; After you get the preliminary timings, you might decide that you can run a BY group analysis&lt;BR /&gt;BY state;&lt;/P&gt;
&lt;P&gt;in a fraction of the time that it takes to run the full regression for all states combined.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jan 2024 23:18:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-save-running-time-for-a-large-sample-regression/m-p/912398#M359688</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2024-01-21T23:18:34Z</dc:date>
    </item>
  </channel>
</rss>

