BookmarkSubscribeRSS Feed
ugly_duck_ling
Calcite | Level 5

I have a dataset that I need help with.

 

data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 '' v5 1 0.004
2 v1 v2 v3 '' '' 1 0.007
3 '' v2 v3 '' '' 1 0.01 
4 '' '' v3 '' '' 1 0.02
0 v1 v2 '' v4 v5 2 0.005
1 v1 '' '' v4 v5 2 0.006
2 v1 '' '' '' v5 2 0.02
3 v1 '' '' '' '' 2 0.03
;
run;

This dataset represents an iterative process of backward selection of model variables where
I am required to identify (for each outcome) the model step at which a specified number of variables yields
an AUC difference of 0.01, and then specify the previous model step variables as my model of choice. I want to do that using a lag function to get the table outlined below.

mock pic.PNG
Because every row represents a model step (that is a number of variables leading to the desired AUC difference,
I would like to specify the row just before the model leading to the AUC difference of 0.01 for each outcome.*/

Any help with that, please?

Thanks

4 REPLIES 4
Ksharp
Super User
data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 . v5 1 0.004
2 v1 v2 v3 . . 1 0.007
3 . v2 v3 . . 1 0.01 
4 . . v3 . . 1 0.02
0 v1 v2 . v4 v5 2 0.005
1 v1 . . v4 v5 2 0.006
2 v1 . . . v5 2 0.02
3 v1 . . . . 2 0.03
;
run;

data want;
 set help;
 lag_auc1=lag(auc);  
 lag_auc2=lag2(auc); 
 lag_auc3=lag3(auc); 
 lag_auc4=lag4(auc); 

 dif_auc1=dif(auc);  
 dif_auc2=dif2(auc); 
 dif_auc3=dif3(auc); 
 dif_auc4=dif4(auc); 

 if outcome ne lag(outcome)  then call missing(lag_auc1,dif_auc1);
 if outcome ne lag2(outcome) then call missing(lag_auc2,dif_auc2);
 if outcome ne lag3(outcome) then call missing(lag_auc3,dif_auc3);
 if outcome ne lag4(outcome) then call missing(lag_auc4,dif_auc4);

run;
ugly_duck_ling
Calcite | Level 5

Thank you so much for the feedback. I have one more question.

 

If I wanted to specify the model step before which a 'dif_auc' was equal to 0.01, how can I code for that?

 

I am guessing:

if dif_auc= 0.01 then modelstep= _N_-1

 

I am not sure.

Ksharp
Super User

You want this ?

 

data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 . v5 1 0.004
2 v1 v2 v3 . . 1 0.007
3 . v2 v3 . . 1 0.01 
4 . . v3 . . 1 0.02
0 v1 v2 . v4 v5 2 0.005
1 v1 . . v4 v5 2 0.006
2 v1 . . . v5 2 0.02
3 v1 . . . . 2 0.03
;
run;

data want;
 set help;
 lag_auc1=lag(auc);  
 lag_auc2=lag2(auc); 
 lag_auc3=lag3(auc); 
 lag_auc4=lag4(auc); 

 dif_auc1=dif(auc);  
 dif_auc2=dif2(auc); 
 dif_auc3=dif3(auc); 
 dif_auc4=dif4(auc); 

 if outcome ne lag(outcome)  then call missing(lag_auc1,dif_auc1);
 if outcome ne lag2(outcome) then call missing(lag_auc2,dif_auc2);
 if outcome ne lag3(outcome) then call missing(lag_auc3,dif_auc3);
 if outcome ne lag4(outcome) then call missing(lag_auc4,dif_auc4);


lag_modelstep=lag(modelstep);
if round(dif_auc1,1e-6) ne 0.01 or  outcome ne lag(outcome) then call missing(lag_modelstep);

run;
PGStats
Opal | Level 21

Combine DO UNTIL() loops with BY processing instead of awkward LAG or DIF functions :

 

data want;
do until(last.outcome);
    set help; by outcome;
    if auc < 0.01 then lastStep = modelstep;
    end;
do until(last.outcome);
    set help; by outcome;
    if modelstep = lastStep then output;
    end;
drop lastStep;
run;

proc print noobs data=want; run;

PGStats_0-1655145739562.png

All auc < 0.01 and no auc < 0.01 cases will be handled appropriately, i.e. by returning the last model and no model, respectively.

PG

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1097 views
  • 1 like
  • 3 in conversation