Hi,
I would like to perform the Hosmer and Lemeshow test on my Validation set. I have managed to use it on the training that I have used to build the model, but really I want to run it on the validation sample (out of sample) to see if my model generalizes well....This below was my process for the Training and Output results....But I would like to know how to apply it to validation once I have scored it using the model created on the training. Your help will be much appreciated. Thank you..
proc logistic data=Xsell_File DESCENDING plots(only)=roc;
class VAR1
VAR2
VAR3
VAR4
VAR5
;
model Click_Flag = VAR1
VAR2
VAR3
VAR4
VAR5
/ selection=stepwise lackfit;
SCORE DATA=Validation OUT=Validation_Scores (RENAME=(P_1=p));
run ;
Partition for the Hosmer and Lemeshow Test | |||||
Click_Flag = 1 | Click_Flag = 0 | ||||
Group | Total | Observed | Expected | Observed | Expected |
10 | 9,236 | 588 | 600.52 | 8,648 | 8635.48 |
9 | 9,664 | 494 | 496.83 | 9,170 | 9167.17 |
8 | 9,668 | 472 | 442.13 | 9,196 | 9225.87 |
7 | 9,656 | 402 | 403.19 | 9,254 | 9252.81 |
6 | 9,716 | 392 | 374.06 | 9,324 | 9341.94 |
5 | 9,711 | 327 | 346.88 | 9,384 | 9364.12 |
4 | 9,612 | 303 | 319.56 | 9,309 | 9292.44 |
3 | 9,665 | 316 | 297.07 | 9,349 | 9367.93 |
2 | 9,665 | 267 | 271.16 | 9,398 | 9393.84 |
1 | 10,047 | 232 | 241.61 | 9,815 | 9805.39 |
Hosmer and Lemeshow Goodness-of-Fit | |||||
Test | |||||
Chi-Square | DF | Pr > ChiSq | |||
7.08 | 8 | 0.5281 |
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Hi, I just posted a similar response to Kanyange. You would be best served posting this question in SAS STAT Community
I would consider this more of a stat than data mining question
Thanks,
Jonathan
Hi Jonathan,
I am quite confused , I thought that modelling is part of DataMining??? Also this test helps to validate the model...to see if your actual and predicted are actually similar....as far as I am aware this is datamining...
Thanks
I think DataMining here refers to Enterprise Miner Software. Logistic regression is a portion of data mining though, and is part of the e-Miner software suite
Hi Reeza,
Thanks for your response..I am having troubles to get an answer on this, could you please help? Does EM have this test? Many thanks
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Thank you very much PG, that's really helpful...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.