09-23-2012 05:04 PM
Hi, I am trying to validate a prediction model using SAS. I need to use the model with the coefficients from the previous study and calculate Area under the ROC curve. I know how to run and get ROC and AUC using the covariates but I can't seem to figure out how to include the coefficients. Anyone know how to do it? Thanks
09-24-2012 10:14 AM
One way is to manually compute the logit of your existing model and use that as the single covariate in PROC LOGISTIC to get the ROC and AUC. That would tell you how well that model fits.
If you wish to know if the existing model could be improved, you could include the logit and all the original variables. If any of the original variables are significant, then that is an indication that the model could be improved. You could also compare the two ROCs to get a summary measure of improvement (see Rick Wicklin's previous post on this). One thing to be a careful of here is to make sure that your 'new' data has a big enough sample size to support all the variables.
09-24-2012 03:29 PM
I am a little confused. Do you wish to use existing coefficients and score new data? You might use the SCORE statement and the inmodel= option, as in example 54.15 from the SAS/STAT12.1 documentation for PROC LOGISTIC.
If you wish to compare ROC curves, and have version 12.1, then you should look at the ROCCONTRAST statement.
If I have misunderstood your question, I apologize.
09-24-2012 04:26 PM
May be what I am asking is not clear. I don't think I am trying to score a new data.
I already have a data and I have a prediction model from previous study that I am trying to validate on my data (external validation). I do not have access to the derivation data from previous study-I just have the model. The final prediction model from the previous study lets say is: exp(2.5+1.2 x1+3.2 x2)/1+exp(2.5+1.2 x1+3.2 x2)
I am using calibration and discrimination to do the external validation of the model. I need to find the area under the curve to see if I get the same results as in the derivation data.
After finding this area, I will run the logistic regression with the variables as well to see if I get the same coefficients and AUC but the first step of validation is to see if the model works in the new data and hence I am trying to find the area under ROC.
I hope I made my question little clear this time.
Thanks for the help
09-25-2012 07:17 AM
Much clearer. Let's see. You have estimates of the parameters, and you have your data. You wish to run your data through the final model, with the known parameter estimates, and get an ROC. The Details section of the documentation has a section "Receiver Operating Characteristic Curves" and in the Comparing ROC Curves section says:
"ROC curves can be created from each model fit in a selection routine, from the specified model in the MODEL statement, from specified models in ROC statements, or from input variables which act as (pi hat) in the preceding discussion."
This sounds like what you want (it also sounds, to me, like scoring your data using the final model). Unfortunately, I cannot find an example where input variables are used. I do know that you would need to calculate the predicted probabilities in a dataset, but the devil is in the details, and that is where I am stuck as well. Perhaps someone can step in and help.