Hi,
I would like to perform the Hosmer and Lemeshow test on my Validation set. I have managed to use it on the training that I have used to build the model, but really I want to run it on the validation sample (out of sample) to see if my model generalizes well....This below was my process for the Training and Output results....But I would like to know how to apply it to validation once I have scored it using the model created on the training. Your help will be much appreciated. Thank you..
proc logistic data=Xsell_File DESCENDING plots(only)=roc;
class VAR1
VAR2
VAR3
VAR4
VAR5
;
model Click_Flag = VAR1
VAR2
VAR3
VAR4
VAR5
/ selection=stepwise lackfit;
SCORE DATA=Validation OUT=Validation_Scores (RENAME=(P_1=p));
run ;
Partition for the Hosmer and Lemeshow Test | |||||
Click_Flag = 1 | Click_Flag = 0 | ||||
Group | Total | Observed | Expected | Observed | Expected |
10 | 9,236 | 588 | 600.52 | 8,648 | 8635.48 |
9 | 9,664 | 494 | 496.83 | 9,170 | 9167.17 |
8 | 9,668 | 472 | 442.13 | 9,196 | 9225.87 |
7 | 9,656 | 402 | 403.19 | 9,254 | 9252.81 |
6 | 9,716 | 392 | 374.06 | 9,324 | 9341.94 |
5 | 9,711 | 327 | 346.88 | 9,384 | 9364.12 |
4 | 9,612 | 303 | 319.56 | 9,309 | 9292.44 |
3 | 9,665 | 316 | 297.07 | 9,349 | 9367.93 |
2 | 9,665 | 267 | 271.16 | 9,398 | 9393.84 |
1 | 10,047 | 232 | 241.61 | 9,815 | 9805.39 |
Hosmer and Lemeshow Goodness-of-Fit | |||||
Test | |||||
Chi-Square | DF | Pr > ChiSq | |||
7.08 | 8 | 0.5281 |
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Hi, I just posted a similar response to Kanyange. You would be best served posting this question in SAS STAT Community
I would consider this more of a stat than data mining question
Thanks,
Jonathan
Hi Jonathan,
I am quite confused , I thought that modelling is part of DataMining??? Also this test helps to validate the model...to see if your actual and predicted are actually similar....as far as I am aware this is datamining...
Thanks
I think DataMining here refers to Enterprise Miner Software. Logistic regression is a portion of data mining though, and is part of the e-Miner software suite
Hi Reeza,
Thanks for your response..I am having troubles to get an answer on this, could you please help? Does EM have this test? Many thanks
I think you might be able to pull this off by writing your final model parameter estimates to a dataset with option OUTEST= and then calling PROC LOGISTIC again with your validation data set with DATA=, bringing in your previous model with INEST=, preventing a new fit with MAXITER=0 and requesting H-L test with LACKFIT.
Good luck
PG
Thank you very much PG, that's really helpful...
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.