I have a dataset with the following form:
year location outcome covariate1 covariate2 covariate3
2007 00001 1 29.7 15.4 1
2008 00001 1 22.5 23.7 3
2007 00002 0 15.5 33.8 2
2008 00002 1 20.9 19.3 2
Outcome is binary, and covariates are a mix of categorical and continuous. What is a good approach to determine if my logistic regression needs to account for the repeated measures?
Interesting question. I suppose you could try modeling it with PROC GLIMMIX as a repeated-measure analysis. If you use the CL option on the RANDOM statement, you will get a confidence interval for the correlation coefficient for the two years. If the CL includes zero (equivalently, the parameter estimate is not significantly different from zero), then that should be evidence that Year=2007 and Year=2008 can be treated as independent samples.
Interesting question. I suppose you could try modeling it with PROC GLIMMIX as a repeated-measure analysis. If you use the CL option on the RANDOM statement, you will get a confidence interval for the correlation coefficient for the two years. If the CL includes zero (equivalently, the parameter estimate is not significantly different from zero), then that should be evidence that Year=2007 and Year=2008 can be treated as independent samples.
And yet, you might still want to consider the two years as non-independent. To me, one of the really nice things about modeling the error structure is that you can accommodate even small correlations. With only two repeated observations, an unstructured covariance matrix is the most flexible statement of the situation. Yes, the two years may have only a small correlation, but if so, that will not greatly affect any standard errors of the difference of fixed effect means. And there may not be enough data to adequately estimate confidence bounds, so that 0 might be included, even if the covariance is substantially away from 0.
My mantra is: If you measure the same experimental unit for the same endpoint, you ought to assume that those points are likely to be more closely related than points from independent units.
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.