<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lasso in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730989#M35447</link>
    <description>Sry maybe I was not clear enough. I can safe the data in a seperate dataset, that is not the problem. The problem is that it does not work in the lasso-code. The valdata=... step does not work</description>
    <pubDate>Fri, 02 Apr 2021 14:54:28 GMT</pubDate>
    <dc:creator>DomUk</dc:creator>
    <dc:date>2021-04-02T14:54:28Z</dc:date>
    <item>
      <title>Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730806#M35445</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;i would like to use lasso application to exclude not important variables.&lt;/P&gt;&lt;P&gt;I would like to run the regression on 5 years on data (2000-2004) and validate it on the year 2005. My dataset contains years from 1980-2020, so does anyone have an idea how i could handle this? I tried to safe all data from 2005 in a new dataset, but it doesnt work. I think the starting point is something like this&lt;/P&gt;&lt;P&gt;proc glmselect data=mylib. dataset plots=all seed=123 valdata= ??? ;&lt;BR /&gt;where 2000 &amp;lt;= year &amp;lt;= 2004 ;&lt;BR /&gt;model y= x1........x100&lt;BR /&gt;/selection= lasso (stop=none choose=validate);&lt;BR /&gt;ods output parameterestimates= check_lasso_parms;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot for an answer&lt;/P&gt;</description>
      <pubDate>Thu, 01 Apr 2021 18:36:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730806#M35445</guid>
      <dc:creator>DomUk</dc:creator>
      <dc:date>2021-04-01T18:36:34Z</dc:date>
    </item>
    <item>
      <title>Re: Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730948#M35446</link>
      <description>&lt;P&gt;How comes you cannot save your 2005 data in a separate dataset?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;All you should do is this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
 set have;
 where year(your_date_var)=2005;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Take care there is a substantial difference between validation data (VALDATA=) and test data (TESTDATA=).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also for more info on LASSO, I advise this paper:&lt;/P&gt;
&lt;P&gt;SAS Global Forum 2020&lt;BR /&gt;Paper SAS4287-2020&lt;BR /&gt;A Survey of Methods in Variable Selection and Penalized Regression&lt;BR /&gt;Yingwei Wang, SAS Institute Inc.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4287-2020.pdf" target="_blank"&gt;https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4287-2020.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Fri, 02 Apr 2021 10:38:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730948#M35446</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-04-02T10:38:06Z</dc:date>
    </item>
    <item>
      <title>Re: Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730989#M35447</link>
      <description>Sry maybe I was not clear enough. I can safe the data in a seperate dataset, that is not the problem. The problem is that it does not work in the lasso-code. The valdata=... step does not work</description>
      <pubDate>Fri, 02 Apr 2021 14:54:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730989#M35447</guid>
      <dc:creator>DomUk</dc:creator>
      <dc:date>2021-04-02T14:54:28Z</dc:date>
    </item>
    <item>
      <title>Re: Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730993#M35448</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;I would be astonished if VALDATA= does not work when used appropriately.&lt;/P&gt;
&lt;P&gt;In cases as this, it's always best to include the LOG.&lt;/P&gt;
&lt;P&gt;Please include the LOG by using the 'Insert Code' icon (&amp;lt;/&amp;gt;) above your entry, that way the LOG does not loose structure and formatting.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Apr 2021 15:23:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/730993#M35448</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-04-02T15:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/731289#M35469</link>
      <description>Thanks for your answer. I decided to work with cross validation, but Iam still interested what the problem is.&lt;BR /&gt;The log statement is the following:&lt;BR /&gt;&lt;BR /&gt;2 Data test;&lt;BR /&gt;3 set mylibf1.endversion;&lt;BR /&gt;4 where houyear=2005;&lt;BR /&gt;5 run;&lt;BR /&gt;&lt;BR /&gt;NOTE: There were 1859 observations read from the data set MYLIBF1.ENDVERSION.&lt;BR /&gt;WHERE houyear=2005;&lt;BR /&gt;NOTE: The data set WORK.TEST has 1859 observations and 29 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 0.06 seconds&lt;BR /&gt;cpu time 0.04 seconds&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;6 proc glmselect data=mylibf1.endversion plots=all seed=123 valdata=test;&lt;BR /&gt;NOTE: Writing HTML Body file: sashtml.htm&lt;BR /&gt;7 where 2000 &amp;lt;= Houyear &amp;lt;= 2004;&lt;BR /&gt;8 model F1_Earn_s_t= BV_s_t negE negEE_s_t c_sales_s_t c_cogs_s_t c_oe_s_t c_int_s_t c_tax_s_t&lt;BR /&gt;8 ! c_other_s_t del_ar_s_t del_inv_s_t del_ap_s_t depr_s_t amort_s_t oth_acc_s_t&lt;BR /&gt;9 /selection= lasso (stop=none choose=validate) ;&lt;BR /&gt;10 ods output parameterestimates= test;&lt;BR /&gt;11 run;&lt;BR /&gt;&lt;BR /&gt;ERROR: Selection aborted as there are no suitable observations for validation.&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;WARNING: Output 'parameterestimates' was not created. Make sure that the output object name,&lt;BR /&gt;label, or path is spelled correctly. Also, verify that the appropriate procedure options&lt;BR /&gt;are used to produce the requested output object. For example, verify that the NOPRINT&lt;BR /&gt;option is not used.&lt;BR /&gt;NOTE: There were 7513 observations read from the data set MYLIBF1.ENDVERSION.&lt;BR /&gt;WHERE (Houyear&amp;gt;=2000 and Houyear&amp;lt;=2004);&lt;BR /&gt;NOTE: PROCEDURE GLMSELECT used (Total process time):&lt;BR /&gt;real time 0.99 seconds&lt;BR /&gt;cpu time 0.25 seconds&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;12 Data test;&lt;BR /&gt;13 set mylibf1.endversion;&lt;BR /&gt;14 where houyear=2005;&lt;BR /&gt;15 run;&lt;BR /&gt;&lt;BR /&gt;ERROR: You cannot open WORK.TEST.DATA for output access with member-level control because&lt;BR /&gt;WORK.TEST.DATA is in use by you in resource environment ViewTable Window.&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 0.01 seconds&lt;BR /&gt;cpu time 0.01 seconds&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;16 Data test;&lt;BR /&gt;17 set mylibf1.endversion;&lt;BR /&gt;18 where houyear=2005;&lt;BR /&gt;19 run;&lt;BR /&gt;&lt;BR /&gt;NOTE: There were 1859 observations read from the data set MYLIBF1.ENDVERSION.&lt;BR /&gt;WHERE houyear=2005;&lt;BR /&gt;NOTE: The data set WORK.TEST has 1859 observations and 29 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 0.03 seconds&lt;BR /&gt;cpu time 0.03 seconds&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;20 proc glmselect data=mylibf1.endversion plots=all seed=123 valdata=test;&lt;BR /&gt;21 where 2000 &amp;lt;= Houyear &amp;lt;= 2004;&lt;BR /&gt;22 model F1_Earn_s_t= BV_s_t negE negEE_s_t c_sales_s_t c_cogs_s_t c_oe_s_t c_int_s_t c_tax_s_t&lt;BR /&gt;22 ! c_other_s_t del_ar_s_t del_inv_s_t del_ap_s_t depr_s_t amort_s_t oth_acc_s_t&lt;BR /&gt;23 /selection= lasso (stop=none choose=validate) ;&lt;BR /&gt;24 ods output parameterestimates= test_2;&lt;BR /&gt;25 run;&lt;BR /&gt;&lt;BR /&gt;ERROR: Selection aborted as there are no suitable observations for validation.&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;WARNING: Output 'parameterestimates' was not created. Make sure that the output object name,&lt;BR /&gt;label, or path is spelled correctly. Also, verify that the appropriate procedure options&lt;BR /&gt;are used to produce the requested output object. For example, verify that the NOPRINT&lt;BR /&gt;option is not used.&lt;BR /&gt;NOTE: There were 7513 observations read from the data set MYLIBF1.ENDVERSION.&lt;BR /&gt;WHERE (Houyear&amp;gt;=2000 and Houyear&amp;lt;=2004);&lt;BR /&gt;NOTE: PROCEDURE GLMSELECT used (Total process time):&lt;BR /&gt;real time 0.18 seconds&lt;BR /&gt;cpu time 0.06 seconds&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 04 Apr 2021 22:19:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/731289#M35469</guid>
      <dc:creator>DomUk</dc:creator>
      <dc:date>2021-04-04T22:19:26Z</dc:date>
    </item>
    <item>
      <title>Re: Lasso</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/731327#M35476</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I haven't tested it (I leave that up to you &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt; ) but&amp;nbsp;&lt;SPAN&gt;I would guess that the where-clause applies to all incoming datasets, also the VALDATA= ds. Hence,&amp;nbsp;no observations qualify for validation anymore which is a problem with choose=validate of course.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Koen&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Apr 2021 10:06:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Lasso/m-p/731327#M35476</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-04-05T10:06:27Z</dc:date>
    </item>
  </channel>
</rss>

