Dear all,
How are you?
I'm writing to ask for help about the large standard errors from my GEE.
Is it correct to report the result given the very large standard errors?
Your insight is greatly appreciated.
Kind regards,
KC
proc genmod data=long;
class ID year scale01;
model y=year scale01 scale01*year /type3;
repeated subject=ID / type=unstr covb corrw;
lsmean scale01*year / cl;
run;
Year*scale01 Least Squares Means | ||||||||
Year | scale01 | Estimate | Standard Error | z Value | Pr > |z| | Alpha | Lower | Upper |
2013 | 1=Disagree a lot | 3151.32 | 524.47 | 6.01 | <.0001 | 0.05 | 2123.4 | 4179.3 |
2013 | 2=Disagree a little | 3830.37 | 518.49 | 7.39 | <.0001 | 0.05 | 2814.2 | 4846.6 |
2013 | 3=Neither agree nor disagree | 3987.09 | 645.05 | 6.18 | <.0001 | 0.05 | 2722.8 | 5251.4 |
2013 | 4=Agree a little | 4791.8 | 416.07 | 11.52 | <.0001 | 0.05 | 3976.3 | 5607.3 |
2013 | 5=Agree a lot | 10020 | 1706.07 | 5.87 | <.0001 | 0.05 | 6676 | 13364 |
2014 | 1=Disagree a lot | 2750.11 | 481.45 | 5.71 | <.0001 | 0.05 | 1806.5 | 3693.7 |
2014 | 2=Disagree a little | 3147.5 | 469.63 | 6.7 | <.0001 | 0.05 | 2227.1 | 4068 |
2014 | 3=Neither agree nor disagree | 3515.44 | 816.01 | 4.31 | <.0001 | 0.05 | 1916.1 | 5114.8 |
2014 | 4=Agree a little | 4166.6 | 418.75 | 9.95 | <.0001 | 0.05 | 3345.9 | 4987.3 |
2014 | 5=Agree a lot | 5754.81 | 782.82 | 7.35 | <.0001 | 0.05 | 4220.5 | 7289.1 |
How did you conclude that the standard errors are large? Most are an order of magnitude less than the estimate, which for almost all problems is a pretty good fit.
What is the response variable here? I notice a strong trend with increasing scale01 score. Perhaps you could look at the trend in the two years, and have greater precision for your questions.
Steve Denham
Hi Steve.
Thanks for your response.
As far as I understand, the standard error is a measure of estimate precision.
I have never seen such large standard errors and therefore am very puzzled if something is wrong which I don't understand.
FYI y is the alcohol consumption(mls) in the past 6 months and I would like to know how y changes over the 2 years in relation to scale01. The below partial output is the estimated y over the 2 years from the same GEE.
Please enlighten me on this. Thank you very much.
Year Least Squares Means
Standard
Year Estimate Error z Value Pr > |z| Alpha Lower Upper
2013 4661.38 352.47 13.22 <.0001 0.05 3970.55 5352.22
2014 3448.69 221.80 15.55 <.0001 0.05 3013.98 3883.41
Differences of Year Least Squares Means
Standard
Year _Year Estimate Error z Value Pr > |z| Alpha Lower Upper
2013 2014 1212.69 261.05 4.65 <.0001 0.05 701.04 1724.35
These are not unreasonable standard errors at all. The lower and upper confidence bounds are based off the confidence intervals, and you get a range of 4 to 5.4 liters in 2013 and 3 to 3.9 liters in 2014. These seem reasonable on all counts.
Perhaps I am misunderstanding--about what value would you expect the standard error to be?
Steve Denham
Hi Steve.
Thanks for your response again.
I guess I don't understand the meaning of standard error very well.
Can I please also ask if my continuous response variable have to been well normally distributed for the result to be valid?
I have read that GEE has a weaker distributional assumption hence I did not use the transformed y for GEE Nevertheless, I tried log transformation but square root transformation does a better job. But if so, another problem arises in term of understanding and interpretation of the result.
Perhaps you could shed some light on this again? Thank you very much.
One of the true advantages of GENMOD (and GLIMMIX) is the ability to specify the distribution that applies to the data (or residuals, depending on specification). You are not restricted to a normal distribution--there are a variety to choose from. Your choice will depend on the process (in a probabilistic sense) by which the data are generated. If you don't have a good idea on this, an empirical approach can be be taken.
For instance, you could specify a "square root" transformation through the use of the FWDLINK and INVLINK statements:
fwdlink link = sqrt(_MEAN_);
invling ilink = (_XBETA_)*(_XBETA_);
Or through the use of programming statements, you could create your own distribution. This, however, is not for the inexperienced.
Steve Denham
Hi Steve.
Thank you very much for your valuable information.
I'll go and explore it a bit more
Hi Steve.
How are you?
I tried what you suggested by specifying the fwdlink and invlink. And the estimates change dramatically.
How would I check which GEE is more appropriate?
Thank you very much.
proc genmod data=long;
class ID year scale01;
fwdlink link = sqrt(_MEAN_);
invlink ilink = (_XBETA_)*(_XBETA_);
model y=year scale01 scale01*year /type3;
repeated subject=ID / type=unstr covb corrw;
lsmean scale01*year / cl;
run;
Year*rAttitudes05 Least Squares Means | ||||||||
Standard | ||||||||
Year | scale01 | Estimate | Error | z Value | Pr > |z| | Alpha | Lower | Upper |
2013 | 1=Disagree a lot | 50.181 | 1.9817 | 25.32 | <.0001 | 0.05 | 46.297 | 54.065 |
2013 | 2=Disagree a little | 56.226 | 2.1635 | 25.99 | <.0001 | 0.05 | 51.985 | 60.466 |
2013 | 3=Neither agree nor disagree | 59.317 | 4.1629 | 14.25 | <.0001 | 0.05 | 51.157 | 67.476 |
2013 | 4=Agree a little | 66.594 | 2.0722 | 32.14 | <.0001 | 0.05 | 62.533 | 70.656 |
2013 | 5=Agree a lot | 98.358 | 8.3331 | 11.8 | <.0001 | 0.05 | 82.025 | 114.69 |
2014 | 1=Disagree a lot | 47.24 | 1.9806 | 23.85 | <.0001 | 0.05 | 43.358 | 51.122 |
2014 | 2=Disagree a little | 50.945 | 1.9992 | 25.48 | <.0001 | 0.05 | 47.026 | 54.863 |
2014 | 3=Neither agree nor disagree | 55.799 | 6.4372 | 8.67 | <.0001 | 0.05 | 43.182 | 68.416 |
2014 | 4=Agree a little | 61.939 | 2.2397 | 27.66 | <.0001 | 0.05 | 57.549 | 66.329 |
2014 | 5=Agree a lot | 73.936 | 5.0621 | 14.61 | <.0001 | 0.05 | 64.014 | 83.857 |
Umm, I don't think they are all that different on the original scale. Try this:
lsmean scale01*year/ilink cl;
This should up the display to include the mean on the original (untransformed) scale.
Now, so far as selecting the proper distribution. That is art, as much as science. Plots of data, examination of residuals, consideration of the physical processes that generate the data and interpretability all enter in. The usual things, like comparing various information criteria, are not so useful, as they depend on the form of the data, and once things are "transformed" as in a generalized linear model, it is like comparing apples to watermelons.
One thing that might tell you how well things are fitting is the length of the confidence bounds, on the original scale. Shorter means a more precise estimate--but again that is "art" and not rigorous, and tells you little about the accuracy of the estimation. And be sure, changing the distribution will have drastic effects on the location estimate.
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.