BookmarkSubscribeRSS Feed
Charlot
Fluorite | Level 6

Hi,

 

I'm working on a prediction problem where the target variable can take hundreds of values. My objective is not to be able to exactly predict the target variable...that would be too difficult!

 

What I'm trying to do is to create a 'top 5' of the most likely targets. My current approach is to create as many binary models as there are values the target variable can take.

 

So for example, if the target variable can be 'a', 'b' or 'c', I would create the following 3 models:

 

Model 1: Predict 'a' vs 'non-a'

Model 2: Predict 'b' vs 'non-b'

Model 3: Predict 'c' vs 'non-c'

 

Except, I'm doing hundreds of them.

 

Once I have my models, I score the data using each one of them. I then rank the scores from highest to lowest, and keep the top 5.

 

So far, so good! I got that to work fine with the code below:

 

%macro m1 ();
%local i next ;
%let i=1;
%do i=1 %to &clust_nb.;
 
data _null_;
 set Training.ref_clust&i;
 call execute('
 proc hpforest data=Training.training_clust'||STRIP(&i)||' VARS_TO_TRY=40;
  
  //my input variables


 target target'||STRIP(TARGET_CODE)||'/level=binary;

 ODS output VariableImportance=VARIMP.VARIMP_CLUST'||STRIP(&i)||'_Target'||STRIP(TARGET_CODE)||';
 save FILE=''/path/Cluster'||STRIP(&i)||'_target'||STRIP(TARGET_CODE)||''';
 run;
 ');
run;
 
 
%end;
%mend m1;
%m1();

 

 

I have over 500 input variables. However, I know that for each of the model, only 10-30 are relevant (and these 10-30 relevant input variables are different for each model, which explains why I start with 500 variables.)

 

Here's what I would like to do:

 

For each of my hpforest, I would like to identify the few variables that are relevant for a given target value. Instead of training my hundreds of models on 500 input variables, I would be training each one of them on just the relevant variable.

 

I would basically like to apply the SAS EM 'Variable Selection' node before running each of my model...but in my SAS Base loop above.

 

I'm still very new to SAS and I'm having a hard time thinking of how I could do this efficiently.

 

Anyone has suggestions?

 

Thanks!

1 REPLY 1
ballardw
Super User

First thing you might try making the code a little simpler which will be easier read.

Instead of:

 proc hpforest data=Training.training_clust'||STRIP(&i)||' VARS_TO_TRY=40;

 

try

 call execute ("proc hpforest data=Training.training_clust&i VARS_TO_TRY=40;");

 

In call execute is often easier to make the resolution of strings first and the call the resolved variable. You also do NOT need to make an entire proc as a single call execute statement. Since call execute stacks up code you can place partial lines as long as the result makes complete syntax. Example:

data _null_;
call execute ("Proc print");
call execute ("data=sashelp.class");
call execute (";");
call execute ("run;");
run;

Which generally makes keeping the quote marks straight a lot easier. You can even test the code by using PUT instead of call execute to examine the generated code.

 

 

It helps with macro development to show 1) a solution that does what you want that works without any macro code, 2) indicate the pieces you want to change (if you have two worked examples it may help) 3) where the values to change would come from (data set, prompt or programmer).

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 745 views
  • 2 likes
  • 2 in conversation