Hi,
I would like to perform the Hosmer and Lemeshow test on my Validation set. I have managed to use it on the training that I have used to build the model, but really I want to run it on the validation sample (out of sample) to see if my model generalizes well....This below was my process for the Training and Output results....But I would like to know how to apply it to validation once I have scored it using the model created on the training. Your help will be much appreciated. Thank you..
proc logistic data=Xsell_File DESCENDING plots(only)=roc;
class VAR1
VAR2
VAR3
VAR4
VAR5
;
model Click_Flag = VAR1
VAR2
VAR3
VAR4
VAR5
/ selection=stepwise lackfit;
SCORE DATA=Validation OUT=Validation_Scores (RENAME=(P_1=p));
run ;
| Partition for the Hosmer and Lemeshow Test | |||||
| Click_Flag = 1 | Click_Flag = 0 | ||||
| Group | Total | Observed | Expected | Observed | Expected |
| 10 | 9,236 | 588 | 600.52 | 8,648 | 8635.48 |
| 9 | 9,664 | 494 | 496.83 | 9,170 | 9167.17 |
| 8 | 9,668 | 472 | 442.13 | 9,196 | 9225.87 |
| 7 | 9,656 | 402 | 403.19 | 9,254 | 9252.81 |
| 6 | 9,716 | 392 | 374.06 | 9,324 | 9341.94 |
| 5 | 9,711 | 327 | 346.88 | 9,384 | 9364.12 |
| 4 | 9,612 | 303 | 319.56 | 9,309 | 9292.44 |
| 3 | 9,665 | 316 | 297.07 | 9,349 | 9367.93 |
| 2 | 9,665 | 267 | 271.16 | 9,398 | 9393.84 |
| 1 | 10,047 | 232 | 241.61 | 9,815 | 9805.39 |
| Hosmer and Lemeshow Goodness-of-Fit | |||||
| Test | |||||
| Chi-Square | DF | Pr > ChiSq | |||
| 7.08 | 8 | 0.5281 | |||
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Hi, I just posted a similar response to Kanyange. You would be best served posting this question in SAS STAT Community
I would consider this more of a stat than data mining question ![]()
Thanks,
Jonathan
Hi Jonathan,
I am quite confused , I thought that modelling is part of DataMining??? Also this test helps to validate the model...to see if your actual and predicted are actually similar....as far as I am aware this is datamining...
Thanks
I think DataMining here refers to Enterprise Miner Software. Logistic regression is a portion of data mining though, and is part of the e-Miner software suite ![]()
Hi Reeza,
Thanks for your response..I am having troubles to get an answer on this, could you please help? Does EM have this test? Many thanks
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Thank you very much PG, that's really helpful...![]()
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.