Hello,
I am conducting a repeated measures analysis with a normally distributed dependent variable. I have 3 time points, but my data are imbalanced and not all observations have 3 time points. I have roughly 8k subjects. As a result, I chose to use proc genmod to conduct a regression. My understanding with proc genmod is that it produces estimates that are robust to the covariance structure. However, with my data, specifying different covariance structures produces substantially different coefficient estimates and standard errors/p-values. If genmod is supposed to produce robust estimates, why would the regression results change dramatically?
/*unstructured matrix*/
proc genmod data=dat;
class id time(ref = "0");
model y = x x_2 time /dist = normal;
repeated subject = id/type = un corrw covb;
run;
/*independent matrix*/
proc genmod data=dat;
class id time(ref = "0");
model y = x x_2 time /dist = normal;
repeated subject = id;
run;
/*autoregressive matrix*/
proc genmod data=dat;
class id time(ref = "0");
model y = x x_2 time /dist = normal;
repeated subject = id/type = ar(1);
run;
The "robust" properties of the GEE models fit by PROC GENMOD when you specify a REPEATED statement are primarily asymptotic, namely the parameter estimates and the estimate for their covariance matrix are consistent even if the working correlation structure is misspecified (given the appropriate regularity conditions). Changing the working correlation structure will change the estimating equations and hence you would expect to see differences in the parameter estimates and the robustness properties do come at the cost of a decrease in the efficiency of the estimates. Also when the number of observations from subjects are unequal, there might be issues when using a more complex working correlation structure if only a small number of subjects contribute towards the estimation of a subset of the working correlation matrix.
It is Mixed Model thing.
Different assumption for different covariance structures, produce different estimated coefficient.
@Rick_SAS wrote some blog about it recently.
Thank you @Ksharp
I found this blog post by @Rick_SAS about longitudinal mixed effects models. However, it still doesn't explain why the esimates would vary dramatically according to correlation structure...especially with proc genmod which should be robust to misspecifying the correlation structure.
Do you know of any other resources that I could look to?
Thanks in advance.
I don't feel it is unexpected that changing the correlation structure can have big impacts on the estimates of the parameters.
This paper gives advice on how to choose the proper correlation structure: https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/198-30.pdf
The "robust" properties of the GEE models fit by PROC GENMOD when you specify a REPEATED statement are primarily asymptotic, namely the parameter estimates and the estimate for their covariance matrix are consistent even if the working correlation structure is misspecified (given the appropriate regularity conditions). Changing the working correlation structure will change the estimating equations and hence you would expect to see differences in the parameter estimates and the robustness properties do come at the cost of a decrease in the efficiency of the estimates. Also when the number of observations from subjects are unequal, there might be issues when using a more complex working correlation structure if only a small number of subjects contribute towards the estimation of a subset of the working correlation matrix.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.