I'm trying to calculate the AUC on a holdout test (or validation) data set. My model has a pretty good AUC on the training data (.87), but I would like to see if it performs well out of sample.
Let's say original datset contains three variables Y, X1, and X2. I split this dataset into two smaller datasets: XTRAIN and XTEST.
These are the steps I have done.
First I trained my model on the training dataset XTRAIN
proc logistic data = XTRAIN outmodel= MODEL1 ;
model Y (EVENT = '1')= X1 X2 ;
run;
Next I use my model to make predictions on the test dataset.
proc logistic inmodel = MODEL1 ;
score data = XTEST out = YPRED_test (rename = (P_1 = YPRED));
run;
Next I use these predictions to plot ROC and calculate my test AUC
proc logistic data= YPRED;
model Y(event="1")=;
roc pred =YPRED;
ods select ROCOVERLAY;
run;
I just wanted to check if these steps were correct. In general, these are the steps for out-of-sample model validation I have used when programming in R and Python.
Okay, re-read and now realized where I was getting confused. Just had a hard time understanding that you could fit the model and calculate prediction scores for different datasets in the same PROC LOGISTIC step.
This is ultimately the fastest way to compare training/test AUC.
proc logistic data=train; model y(event="1") = x1 x2; score data=train fitstat; score data=valid fitstat; run;
.
https://support.sas.com/kb/39/724.html
That link is pretty old, and maybe there are newer versions of code that do the same (or maybe there were always two ways to do this).
You can try it both ways and see if the results are the same. You can also try the second piece of code without the MODEL statement and see what happens.
Okay, re-read and now realized where I was getting confused. Just had a hard time understanding that you could fit the model and calculate prediction scores for different datasets in the same PROC LOGISTIC step.
This is ultimately the fastest way to compare training/test AUC.
proc logistic data=train; model y(event="1") = x1 x2; score data=train fitstat; score data=valid fitstat; run;
.
proc logistic data=sashelp.heart; model status(event='Dead')=weight height/nofit; roc 'weight' pred=weight; roc 'height' pred=height; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.