Dear all,
I am trying to externally validate a logistic regression model.
From my reading - I have identified I need to
1. calculate the Linear Predictor,
2. Transport the original regression coefficients to the external dataset and calculate the linear predictor.
3. Determine the amount of miscalibration in intercept and calibration slope.
4. Determine Discrimination and Calibration in the external dataset
I have the coefficients and values required for step 1 but do not know what code I'd need to achieve these steps.
Does anyone have code that would work for this please?
Many thanks,
Craig
See this note (http://support.sas.com/kb/22630) that discusses evaluating calibration and discrimination in logistic models.
Using a separate data set for evaluation, the misclassification error rate and several other statistics can be obtained for training and test (external) data sets using the PARTITION statement in PROC HPLOGISTIC. The input data set should combine the two data sets with a variable that identifies the two. The specified model will be fit to the training data and evaluated using multiple statistics for both parts of the data. See the example titled "Partitioning Data" in the HPLOGISTIC documentation.
Het Stat Dave,
Thank you for your explanation - that resource for discrimination / calibration is helpful - I have used several of these during model development 🙂
So to clarify - do I "merge" both datasets? Or what is the process for this? I have the coefficients from the development model but am unsure how to proceed from there.
Have a nice Easter,
Craig
Hey StatsDave,
Thanks for the answer. Ah I see - thank you for the explanation. I was informed that coefficients i.e the Linear Predictor needed to be the same for validation - I'm assuming this method does this and fits TRIPOD guidelines?
I'll draw up some code and can try with a colleague next week and report back.
Many thanks,
Craig
Hi Stats Dave,
Thanks for your advice - I've drawn up some code using the links your provided. Can I just check this code with yourself - variable names should match and I'm looking to have ROC values with CIs / plot and calibration plot ?
Best wishes,
Craig
*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX !! TEST CODE FOR VALIDATION !! N.B ORDER OF CASE AND CONTROL?? XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX; *PROC HPLOGISTIC DATA = BOTH DESCENDING plots=calibration(CLM ShowObs); FORMAT SEX SEX. EDUCATC EDUCATC. SMOKE SMOKE. DRINK DRINK. ALCOIC ALCOIC. ; CLASS SEX ( REF = '0: Female' ) EDUCATC ( REF = '16+ Years' ) SMOKE ( REF = '0: Never') DRINK (REF = '0: Never') ALCOIC (REF = '6: Never') ; MODEL HNC = AGE SEX EDUCATC SMOKE SMOKP DRINK ALCOIC / ROCCI ROCCONTRAST RL GOF; RUN;
PROC HPLOGISTIC does not have the same statements and options as PROC LOGISTIC. See its documentation. But if what you want is the calibration plot and the area under the ROC curve (AUC) for your test data, then you could just use PROC LOGISTIC as shown in the example titled "Goodness-of-Fit Tests and Calibration" in the LOGISTIC documentation. That example fits the model to a set of training data and the fitted model is then evaluated using a set of test data. The two data sets are stacked as we discussed. At the end of the example, PROC LOGISTIC is again run to evaluate the model on the test portion of the data. That PROC LOGISTIC could be modified as below to provide the AUC and the Brier score that is mentioned in the note I referred to earlier.
proc logistic data=Scored(where=(InLaborForce2=.))
plots(only)=calibration(clm showobs);
model InLaborForce = XBeta / gof;
score data=Scored(where=(InLaborForce2=.)) out=_null_ fitstat;
roc;
test intercept = 0, XBeta = 1;
run;
Forgive me, I am struggling to follow this.
So I need to validate the model using the stacked data, assess ROC with confidence intervals and assess calibration. The validation needs to use the same coefficients from the original dataset. The variables and reference values shouldn't change but how would I rewrite it so these things exist for HPLogistic please? Could you even modify the statement below to show me what needs changed please?
Many thanks,
Craig
*PROC LOGISTIC DATA = BOTH plots=calibration(CLM ShowObs); FORMAT SEX SEX. EDUCATC EDUCATC. SMOKE SMOKE. DRINK DRINK. ALCOIC ALCOIC. ; CLASS SEX ( REF = '0: Female' ) EDUCATC ( REF = '16+ Years' ) SMOKE ( REF = '0: Never') DRINK (REF = '0: Never') ALCOIC (REF = '6: Never') ; MODEL HNC = AGE SEX EDUCATC SMOKE SMOKP DRINK ALCOIC / ROCCI ROCCONTRAST RL GOF; RUN;
Thanks Dave
I'll try this tomorrow and report back 🙂
Sorry for the delay in getting back - swamped w things.
So I have attempted to make sure all variables are the same in nature and format. Only exception is outcome variable but if this is binary guessing it doesn't matter?
I have exported the model coefficients from the development file into a .rtf file. - is this correct or what would be the next steps?
Many thanks,
Craig
FILENAME COEF 'J:\MED\DentalSchool\DPHU\CraigSmith\ARCAGE\model_coef.txt.'; /* Step 2: Load the coefficients from the file */ data COEF; infile 'model_coef.txt'; input VARIABLE : $20. ESTIMATE; run; /* Step 3: Merge the coefficients with the validation dataset */ proc sql; create table VALID_COEF as select A.*, B.ESTIMATE from VALID A left join COEF B on A.VARIABLE = B.VARIABLE; quit; /* Step 4: Calculate the predicted probabilities */ data VALID_PRED; set VALID_COEF; PROB = 1 / (1 + exp(-(& 0.4692. + ESTIMATE))); run; /* Step 5: Calculate the model performance */ proc logistic data=VALID_PRED; model HNC (event='1') = PROB / nofit; roc 'Model Performance' / ci=none; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.