<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Automating Variable Selection in SAS Base in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Automating-Variable-Selection-in-SAS-Base/m-p/430235#M6596</link>
    <description>&lt;P&gt;First thing you might try making the code a little simpler which will be easier read.&lt;/P&gt;
&lt;P&gt;Instead of:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;proc hpforest data=Training.training_clust'||STRIP(&amp;amp;i)||' VARS_TO_TRY=40;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;try&lt;/P&gt;
&lt;P&gt;&amp;nbsp;call execute ("proc hpforest data=Training.training_clust&amp;amp;i VARS_TO_TRY=40;");&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In call execute is often easier to make the resolution of strings first and the call the resolved variable. You also do NOT need to make an entire proc as a single call execute statement. Since call execute stacks up code you can place partial lines as long as the result makes complete syntax. Example:&lt;/P&gt;
&lt;PRE&gt;data _null_;
call execute ("Proc print");
call execute ("data=sashelp.class");
call execute (";");
call execute ("run;");
run;&lt;/PRE&gt;
&lt;P&gt;Which generally makes keeping the quote marks straight a lot easier. You can even test the code by using PUT instead of call execute to examine the generated code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It helps with macro development to show 1) a solution that does what you want that works without any macro code, 2) indicate the pieces you want to change (if you have two worked examples it may help)&amp;nbsp;3) where the values to change would come from (data set, prompt or programmer).&lt;/P&gt;</description>
    <pubDate>Wed, 24 Jan 2018 00:54:12 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2018-01-24T00:54:12Z</dc:date>
    <item>
      <title>Automating Variable Selection in SAS Base</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Automating-Variable-Selection-in-SAS-Base/m-p/430126#M6595</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm working on a prediction problem where the target variable can take hundreds of values. My objective is not to be able to exactly predict the target variable...that would be too difficult!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I'm trying to do is to create a 'top 5' of the most likely targets. My current approach is to create as many binary models as there are values the target variable can take.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So for example, if the target variable can be 'a', 'b' or 'c', I would create&amp;nbsp;the following 3 models:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Model 1: Predict 'a' vs 'non-a'&lt;/P&gt;&lt;P&gt;Model 2: Predict 'b' vs 'non-b'&lt;/P&gt;&lt;P&gt;Model 3: Predict 'c' vs 'non-c'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Except, I'm doing hundreds of them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Once I have my models, I score the data using each one of them. I then rank the scores from highest to lowest, and keep the top 5.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So far, so good! I got that to work fine with the code below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%macro m1 ();&lt;BR /&gt;%local i next ;&lt;BR /&gt;%let i=1;&lt;BR /&gt;%do i=1 %to &amp;amp;clust_nb.;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;data _null_;&lt;BR /&gt;&amp;nbsp;set Training.ref_clust&amp;amp;i;&lt;BR /&gt;&amp;nbsp;call execute('&lt;BR /&gt;&amp;nbsp;proc hpforest data=Training.training_clust'||STRIP(&amp;amp;i)||' VARS_TO_TRY=40;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;//my input variables&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;nbsp;target target'||STRIP(TARGET_CODE)||'/level=binary;&lt;/P&gt;&lt;P&gt;&amp;nbsp;ODS output VariableImportance=VARIMP.VARIMP_CLUST'||STRIP(&amp;amp;i)||'_Target'||STRIP(TARGET_CODE)||';&lt;BR /&gt;&amp;nbsp;save FILE=''/path/Cluster'||STRIP(&amp;amp;i)||'_target'||STRIP(TARGET_CODE)||''';&lt;BR /&gt;&amp;nbsp;run;&lt;BR /&gt;&amp;nbsp;');&lt;BR /&gt;run;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;%end;&lt;BR /&gt;%mend m1;&lt;BR /&gt;%m1();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have over 500 input variables. However, I know that for each of the model, only 10-30 are relevant (and these 10-30 relevant input variables are different for each model, which explains why I start with 500 variables.)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's what I would like to do:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For each of my hpforest, I would like to identify the few variables that are relevant for a given target value. Instead of training my hundreds of models on 500 input variables, I would be training each one of them on just the relevant variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would basically like to apply the SAS EM&amp;nbsp;'Variable Selection' node before running each of my model...but in my SAS Base loop above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm still very new to SAS and I'm having a hard time thinking of how I could do this efficiently.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyone has suggestions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2018 19:16:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Automating-Variable-Selection-in-SAS-Base/m-p/430126#M6595</guid>
      <dc:creator>Charlot</dc:creator>
      <dc:date>2018-01-23T19:16:56Z</dc:date>
    </item>
    <item>
      <title>Re: Automating Variable Selection in SAS Base</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Automating-Variable-Selection-in-SAS-Base/m-p/430235#M6596</link>
      <description>&lt;P&gt;First thing you might try making the code a little simpler which will be easier read.&lt;/P&gt;
&lt;P&gt;Instead of:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;proc hpforest data=Training.training_clust'||STRIP(&amp;amp;i)||' VARS_TO_TRY=40;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;try&lt;/P&gt;
&lt;P&gt;&amp;nbsp;call execute ("proc hpforest data=Training.training_clust&amp;amp;i VARS_TO_TRY=40;");&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In call execute is often easier to make the resolution of strings first and the call the resolved variable. You also do NOT need to make an entire proc as a single call execute statement. Since call execute stacks up code you can place partial lines as long as the result makes complete syntax. Example:&lt;/P&gt;
&lt;PRE&gt;data _null_;
call execute ("Proc print");
call execute ("data=sashelp.class");
call execute (";");
call execute ("run;");
run;&lt;/PRE&gt;
&lt;P&gt;Which generally makes keeping the quote marks straight a lot easier. You can even test the code by using PUT instead of call execute to examine the generated code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It helps with macro development to show 1) a solution that does what you want that works without any macro code, 2) indicate the pieces you want to change (if you have two worked examples it may help)&amp;nbsp;3) where the values to change would come from (data set, prompt or programmer).&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jan 2018 00:54:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Automating-Variable-Selection-in-SAS-Base/m-p/430235#M6596</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-01-24T00:54:12Z</dc:date>
    </item>
  </channel>
</rss>

