05-10-2017 06:04 AM - edited 05-10-2017 06:25 AM
i want to make a ROC curve for an hold out sample. The thing is, i made a logistic regression for some data i have from the year 2007 and I want to see how this model fits the data in the year 2008. I can't use this code:
proc logistic data = sasdata.Data2008;
model flag(event='1')=TL_TA EAT_TA AGE /outroc=r;
because then my model and my ROC curve is based on a logistic regression on the 2008 dataset. I want to do a logistc regression on the 2007 set, and then use this fit to see how it fits the 2008 data set. So i tried this:
proc logistic data=sasdata.data2007;
class AGE (ref='Ny') / param = ref;
model flag(event="1") = TL_TA EAT_TA AGE / CTABLE outroc=troc;
score data=sasdata.data2008 out=valpred outroc=vroc;
This seems ok. I get a ROC curve both for the fit of 2007 and then a ROC curve for how the 2007 model fits on the 2008 model. The thing is, i want to find the optimale cutoff point in 2008, where the euclidean distance from 1.0 is minimized to the ROC curve, how can i do that? The ctable option gives me the predicted probabilities for the 2007 data set only.
I hope you can help.
05-11-2017 10:22 AM
See the ROCPLOT macro. Specify the SCORE OUTROC= data set in the INROC= macro option, and the SCORE OUT= data set and its predicted probabilities in the macro's INPRED= and P= options. See the macro documentation for information on the various optimality criteria you can use.