BookmarkSubscribeRSS Feed
ugly_duck_ling
Calcite | Level 5

I have a dataset that I need help with.

 

data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 '' v5 1 0.004
2 v1 v2 v3 '' '' 1 0.007
3 '' v2 v3 '' '' 1 0.01 
4 '' '' v3 '' '' 1 0.02
0 v1 v2 '' v4 v5 2 0.005
1 v1 '' '' v4 v5 2 0.006
2 v1 '' '' '' v5 2 0.02
3 v1 '' '' '' '' 2 0.03
;
run;

This dataset represents an iterative process of backward selection of model variables where
I am required to identify (for each outcome) the model step at which a specified number of variables yields
an AUC difference of 0.01, and then specify the previous model step variables as my model of choice. I want to do that using a lag function to get the table outlined below.

mock pic.PNG
Because every row represents a model step (that is a number of variables leading to the desired AUC difference,
I would like to specify the row just before the model leading to the AUC difference of 0.01 for each outcome.*/

Any help with that, please?

Thanks

4 REPLIES 4
Ksharp
Super User
data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 . v5 1 0.004
2 v1 v2 v3 . . 1 0.007
3 . v2 v3 . . 1 0.01 
4 . . v3 . . 1 0.02
0 v1 v2 . v4 v5 2 0.005
1 v1 . . v4 v5 2 0.006
2 v1 . . . v5 2 0.02
3 v1 . . . . 2 0.03
;
run;

data want;
 set help;
 lag_auc1=lag(auc);  
 lag_auc2=lag2(auc); 
 lag_auc3=lag3(auc); 
 lag_auc4=lag4(auc); 

 dif_auc1=dif(auc);  
 dif_auc2=dif2(auc); 
 dif_auc3=dif3(auc); 
 dif_auc4=dif4(auc); 

 if outcome ne lag(outcome)  then call missing(lag_auc1,dif_auc1);
 if outcome ne lag2(outcome) then call missing(lag_auc2,dif_auc2);
 if outcome ne lag3(outcome) then call missing(lag_auc3,dif_auc3);
 if outcome ne lag4(outcome) then call missing(lag_auc4,dif_auc4);

run;
ugly_duck_ling
Calcite | Level 5

Thank you so much for the feedback. I have one more question.

 

If I wanted to specify the model step before which a 'dif_auc' was equal to 0.01, how can I code for that?

 

I am guessing:

if dif_auc= 0.01 then modelstep= _N_-1

 

I am not sure.

Ksharp
Super User

You want this ?

 

data help;
input modelstep var1 $ var2 $ var3 $ var4 $ var5 $ outcome auc;
datalines;
0 v1 v2 v3 v4 v5 1 0.003
1 v1 v2 v3 . v5 1 0.004
2 v1 v2 v3 . . 1 0.007
3 . v2 v3 . . 1 0.01 
4 . . v3 . . 1 0.02
0 v1 v2 . v4 v5 2 0.005
1 v1 . . v4 v5 2 0.006
2 v1 . . . v5 2 0.02
3 v1 . . . . 2 0.03
;
run;

data want;
 set help;
 lag_auc1=lag(auc);  
 lag_auc2=lag2(auc); 
 lag_auc3=lag3(auc); 
 lag_auc4=lag4(auc); 

 dif_auc1=dif(auc);  
 dif_auc2=dif2(auc); 
 dif_auc3=dif3(auc); 
 dif_auc4=dif4(auc); 

 if outcome ne lag(outcome)  then call missing(lag_auc1,dif_auc1);
 if outcome ne lag2(outcome) then call missing(lag_auc2,dif_auc2);
 if outcome ne lag3(outcome) then call missing(lag_auc3,dif_auc3);
 if outcome ne lag4(outcome) then call missing(lag_auc4,dif_auc4);


lag_modelstep=lag(modelstep);
if round(dif_auc1,1e-6) ne 0.01 or  outcome ne lag(outcome) then call missing(lag_modelstep);

run;
PGStats
Opal | Level 21

Combine DO UNTIL() loops with BY processing instead of awkward LAG or DIF functions :

 

data want;
do until(last.outcome);
    set help; by outcome;
    if auc < 0.01 then lastStep = modelstep;
    end;
do until(last.outcome);
    set help; by outcome;
    if modelstep = lastStep then output;
    end;
drop lastStep;
run;

proc print noobs data=want; run;

PGStats_0-1655145739562.png

All auc < 0.01 and no auc < 0.01 cases will be handled appropriately, i.e. by returning the last model and no model, respectively.

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1381 views
  • 1 like
  • 3 in conversation