06-20-2016 01:36 PM
I have a large dataset (19k) and I am using logistic regression to estimate probabilities of experiencing an event at the patient level. I am interested in looking at the effect of a facility characteristic on the odds of said event. This facility characteristic is naturally continuous and I have constructed quartiles. I calculated the crude rate of event within each quartile. When I call for the predicted probabilities from a logistic regression model with patient and disease characteristics, the mean predicted probabilities of event across the facility-characteristic quartiles are very similar to each other as well as the overall crude rate (within a thousandth). I am struggling to understand the following: when I add facility-characteristic quartile to the above model and call for the predicted probabilities, the means of the probabilities of event across the factility-characteristic quartiles are equivalent to the crude rates. I understand that the mean of the predicted probabilities will be equivalent to the crude rates across facility-characteristic quartiles when it is the only predictor in the model, but it is not here. Can anyone help me understand why this is? Happy to provide coefficients and output.
With much appreciation!
06-24-2016 08:49 AM
I think your question will be clearer if you would post the SAS code that you are using. It is difficult to follow your question. My best guess is that the coefficient of the "facility characteristic" is relatively small so that the predictions of the model WITHOUT the "facility characteristic" is essentially the same as WITH the "facility characteristic."
In general, I find that interpretation is improved by graphing the probabilities. The easiest way to do that is to use the EFFECTPLOT statement. I've written an article about how to use the EFFECTPLOTS statement to visualize predicted values of regression models, and the example is a logistic model, so you should be able to adapt the code in the blog post to your case.