<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PROC HPGENSELECT with METHOD=LASSO runs forever in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926040#M46053</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I'm trying to use&amp;nbsp;PROC HPGENSELECT with METHOD=LASSO to select covariates. I have about 370,000 observations. When I tested the code with only 8 covariates, it finished running in less than twenty minutes. When I included over 300 binary covariates (&amp;amp;add300cov.), it ran overnight but still did not finish. Can anyone see anything to modify in the my code below? Thanks a lot!&lt;/P&gt;
&lt;P&gt;proc hpgenselect data=population;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;class female(ref='0') race_ethncty(ref='1') &amp;amp;add300cov_ref.;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; model success(event="1") = age female race_ethncty var4 var5 var6 var7 var8&amp;nbsp; &amp;amp;add300cov./ dist=binary include=(age female race_ethncty var4 var5 var6 var7 var8);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; selection method=lasso;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 26 Apr 2024 15:37:41 GMT</pubDate>
    <dc:creator>lichee</dc:creator>
    <dc:date>2024-04-26T15:37:41Z</dc:date>
    <item>
      <title>PROC HPGENSELECT with METHOD=LASSO runs forever</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926040#M46053</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I'm trying to use&amp;nbsp;PROC HPGENSELECT with METHOD=LASSO to select covariates. I have about 370,000 observations. When I tested the code with only 8 covariates, it finished running in less than twenty minutes. When I included over 300 binary covariates (&amp;amp;add300cov.), it ran overnight but still did not finish. Can anyone see anything to modify in the my code below? Thanks a lot!&lt;/P&gt;
&lt;P&gt;proc hpgenselect data=population;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;class female(ref='0') race_ethncty(ref='1') &amp;amp;add300cov_ref.;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; model success(event="1") = age female race_ethncty var4 var5 var6 var7 var8&amp;nbsp; &amp;amp;add300cov./ dist=binary include=(age female race_ethncty var4 var5 var6 var7 var8);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; selection method=lasso;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Apr 2024 15:37:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926040#M46053</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-04-26T15:37:41Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT with METHOD=LASSO runs forever</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926054#M46054</link>
      <description>&lt;P&gt;300 binary covariates most likely will take an extremely long time to fit such a model. I don't think there's any way around that. Even with an HP PROC. But there are optimization options and tolerance options for PROC HPGENSELECT, you could try those and see if anything helps (I'm guessing they would)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As far as what else to try, you could see which of the binary predictors are highly correlated with the binary response using a Chi-squared test (two way table in PROC FREQ with the CHISQ option will get you there) and just pick a few of the best binary predictors to use in the model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thinking out of the box, another approach is to use logistic Partial Least Squares with all 300 binary predictors. I have no doubt that even with 300 binary predictors it would finish much more quickly, and I would be surprised if it even took an hour. However, the only software to do this that I know of is in R (&lt;A href="https://cran.r-project.org/web/packages/plsRglm/plsRglm.pdf" target="_blank" rel="noopener"&gt;https://cran.r-project.org/web/packages/plsRglm/plsRglm.pdf&lt;/A&gt;)&lt;/P&gt;</description>
      <pubDate>Fri, 26 Apr 2024 16:38:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926054#M46054</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-04-26T16:38:13Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT with METHOD=LASSO runs forever</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926107#M46057</link>
      <description>&lt;P&gt;By any chance did you turn on any system performance monitor while that code was running?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suspect you might see a lot of disk activity and possibly very high memory usage. When there are lots of things involved with calculations you might find that SAS is spending more time writing/reading data to and from temporary storage.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Apr 2024 20:27:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-with-METHOD-LASSO-runs-forever/m-p/926107#M46057</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-04-26T20:27:36Z</dc:date>
    </item>
  </channel>
</rss>

