Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Automating Variable Selection in SAS Base

Reply
Highlighted
New Contributor
Posts: 4

Automating Variable Selection in SAS Base

Hi,

 

I'm working on a prediction problem where the target variable can take hundreds of values. My objective is not to be able to exactly predict the target variable...that would be too difficult!

 

What I'm trying to do is to create a 'top 5' of the most likely targets. My current approach is to create as many binary models as there are values the target variable can take.

 

So for example, if the target variable can be 'a', 'b' or 'c', I would create the following 3 models:

 

Model 1: Predict 'a' vs 'non-a'

Model 2: Predict 'b' vs 'non-b'

Model 3: Predict 'c' vs 'non-c'

 

Except, I'm doing hundreds of them.

 

Once I have my models, I score the data using each one of them. I then rank the scores from highest to lowest, and keep the top 5.

 

So far, so good! I got that to work fine with the code below:

 

%macro m1 ();
%local i next ;
%let i=1;
%do i=1 %to &clust_nb.;
 
data _null_;
 set Training.ref_clust&i;
 call execute('
 proc hpforest data=Training.training_clust'||STRIP(&i)||' VARS_TO_TRY=40;
  
  //my input variables


 target target'||STRIP(TARGET_CODE)||'/level=binary;

 ODS output VariableImportance=VARIMP.VARIMP_CLUST'||STRIP(&i)||'_Target'||STRIP(TARGET_CODE)||';
 save FILE=''/path/Cluster'||STRIP(&i)||'_target'||STRIP(TARGET_CODE)||''';
 run;
 ');
run;
 
 
%end;
%mend m1;
%m1();

 

 

I have over 500 input variables. However, I know that for each of the model, only 10-30 are relevant (and these 10-30 relevant input variables are different for each model, which explains why I start with 500 variables.)

 

Here's what I would like to do:

 

For each of my hpforest, I would like to identify the few variables that are relevant for a given target value. Instead of training my hundreds of models on 500 input variables, I would be training each one of them on just the relevant variable.

 

I would basically like to apply the SAS EM 'Variable Selection' node before running each of my model...but in my SAS Base loop above.

 

I'm still very new to SAS and I'm having a hard time thinking of how I could do this efficiently.

 

Anyone has suggestions?

 

Thanks!

Super User
Posts: 12,148

Re: Automating Variable Selection in SAS Base

First thing you might try making the code a little simpler which will be easier read.

Instead of:

 proc hpforest data=Training.training_clust'||STRIP(&i)||' VARS_TO_TRY=40;

 

try

 call execute ("proc hpforest data=Training.training_clust&i VARS_TO_TRY=40;");

 

In call execute is often easier to make the resolution of strings first and the call the resolved variable. You also do NOT need to make an entire proc as a single call execute statement. Since call execute stacks up code you can place partial lines as long as the result makes complete syntax. Example:

data _null_;
call execute ("Proc print");
call execute ("data=sashelp.class");
call execute (";");
call execute ("run;");
run;

Which generally makes keeping the quote marks straight a lot easier. You can even test the code by using PUT instead of call execute to examine the generated code.

 

 

It helps with macro development to show 1) a solution that does what you want that works without any macro code, 2) indicate the pieces you want to change (if you have two worked examples it may help) 3) where the values to change would come from (data set, prompt or programmer).

Ask a Question
Discussion stats
  • 1 reply
  • 184 views
  • 2 likes
  • 2 in conversation