06-24-2016 09:39 AM
I've been exploring the PARTITION statement in PROC HPLOGISTIC, which allows one to split a dataset into training and validation portions, and fit a model on the former to test on the latter. I've been getting different forward selection results from the training data when I identify it with the PARTITION statement versus when I just fit the model on a dataset that includes only the training observations. Everything is kept identical--the same observations, same selection and stop criteria, and even the first seven variable selections are identical, with the same chi-square and likelihood statistics. But for some reason after that point the model fit statistics start diverging. Why would this be the case, if it's the exact same data being used?