I have developed several predictive (logit) models, all with sample sizes ~ 10-20K or more. Typically my model results are postive- rejecting B's = 0, no multicollinearity, significant coefficients, high AROC, high percentage of correct predictions for binary groups (0's and 1's). However, when I conduct the Hosmer Lemeshaw test (in base SAS), most of the time it is significant, indicating lack of fit. I've read several places- (see links and refs below- but few 'hard' references) that "As the sample size gets large, the H-L statistic can find smaller and smaller differences between observed and model-predicted values to be significant." leading to an erroneous conclusion about model fit. Do you agree? Do you have any stronger references than I have on this? If you agree, then how large is too large when it comes to sample size? And, since SAS Enterprise Miner doesn't include the HL test (I typically use a code node to do this in EM) I'm curious if people care much about it. In fact, SAS tech support told me it was not included as an option in EM because the HL test is a holdover from the past, and basically doesn't fit the data mining paradigm (i.e. large data sets?) that SAS EM is built upon. Thanks in advance. NCSU Faculty Web Site , the STATA stat listserv, from quality forum p. 5, another academic web page, and JOURNAL OF PALLIATIVE MEDICINE Volume 12, Number 2, 2009 Prediction of Pediatric Death in the Yearafter Hospitalization: A Population-Level Retrospective Cohort Study Chris Feudtner, M.D., Ph.D., M.P.H.,1,5 Kari R. Hexem, M.P.H.,1 Mayadah Shabbout, M.S.,3James A. Feinstein, M.D.,1 Julie Sochalski, Ph.D., R.N.,4,5 and Jeffery H. Silber, M.D., Ph.D.2,5 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656437/ "‘The Hosmer-Lemeshow test detected astatistically significant degree of miscalibration in both models, due to the extremely large sample size of the models, as the differences between the observed and expected values within each group are relatively small."
... View more