Does adding the "rsquare" option after the model not give the correct r-squared statistic? Should I run it in a proc genmod instead? Thanks for your help!
@avak wrote:
Does adding the "rsquare" option after the model not give the correct r-squared statistic? Should I run it in a proc genmod instead? Thanks for your help!
According to the documentation of PROC LOGISTIC at
R-square statistics are most useful for comparing competing models that are not necessarily nested—larger values indicate better models.
By the way, the RSQUARE option produces generalized R-squared values that are appropriate for the logistic case. I don't think these can be interpreted in the same way as the regular R-squared that is produced when you have a continuous Y.
But as I understand the situation you have described, you are not comparing competing models, you just have one model with a low generalized R-squared of 0.04, and by itself this is meaningless. I think this generalized R-squared is the wrong statistic for your situation, and the C statistic and the % Concordance statistic might be more meaningful.
You have asked about adding in a fixed effect, did you try that yet?
Ah, I see, my apologies regarding the r-squared statistic. I'm going to work on enhancing the C statistic per the instructions from Ksharp below. 
For the code for fixed effects model, do I need to narrow down the facilities to only those with a certain number of observations using proc sql? 
When I re-ran the model using the class statement and adding the facility key into the model, I got these warnings:
WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood
estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are
based on the last maximum likelihood iteration. Validity of the model fit is
questionable.
For the odds ratio estimates, it compared some of the facilities to fac_key 99, which I'm not really understanding. I included a screenshot sample of this below.
Thanks again for your help.
Going back to your original question which I didn't see at the time, you asked about fitting a fixed effects model. And in fact, either a fixed effects model (also called a conditional logistic model) or a Generalized Estimating Equations (GEE) model are reasonable approaches for dealing with your situation of having data from each of many facilities if you assume that the observations within a facility are correlated while observations across facilities are not. As mentioned in this note on the types of logistic models available, the fixed effects model can be fit by specifying your facilities variable in the STRATA statement. The GEE model can be fit in PROC GENMOD by specifying your facilities variable in the REPEATED statement. Both approaches have the benefit of accounting for the correlation within facilities but avoid adding a set of parameters to the model for each of the facilities. If instead of one of these methods you include your facilities variable in the CLASS and MODEL statements, the resulting unconditional model must estimate the entire set of facility parameters and this can cause estimation problems such as separation. You can see examples of both models in the LOGISTIC or GENMOD documentation. Also see the book by Allison on the fixed effects model referenced in the note I mention above. It has a chapter on this model in the binary response setting.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
