BookmarkSubscribeRSS Feed
Miracle
Barite | Level 11

Dear all,

How are you?

I'm writing to ask for help about the large standard errors from my GEE.

Is it correct to report the result given the very large standard errors?

Your insight is greatly appreciated.


Kind regards,

KC

proc genmod data=long;

class ID year scale01;

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*scale01 Least Squares Means
Yearscale01EstimateStandard Errorz ValuePr > |z|AlphaLowerUpper
20131=Disagree a lot3151.32524.476.01<.00010.052123.44179.3
20132=Disagree a little3830.37518.497.39<.00010.052814.24846.6
20133=Neither agree nor disagree3987.09645.056.18<.00010.052722.85251.4
20134=Agree a little4791.8416.0711.52<.00010.053976.35607.3
20135=Agree a lot100201706.075.87<.00010.05667613364
20141=Disagree a lot2750.11481.455.71<.00010.051806.53693.7
20142=Disagree a little3147.5469.636.7<.00010.052227.14068
20143=Neither agree nor disagree3515.44816.014.31<.00010.051916.15114.8
20144=Agree a little4166.6418.759.95<.00010.053345.94987.3
20145=Agree a lot5754.81782.827.35<.00010.054220.57289.1
8 REPLIES 8
SteveDenham
Jade | Level 19

How did you conclude that the standard errors are large?  Most are an order of magnitude less than the estimate, which for almost all problems is a pretty good fit.

What is the response variable here?  I notice a strong trend with increasing scale01 score.  Perhaps you could look at the trend in the two years, and have greater precision for your questions.

Steve Denham

Miracle
Barite | Level 11

Hi Steve.

Thanks for your response.

As far as I understand, the standard error is a measure of estimate precision.

I have never seen such large standard errors and therefore am very puzzled if something is wrong which I don't understand.

FYI y is the alcohol consumption(mls) in the past 6 months and I would like to know how y changes over the 2 years in relation to scale01. The below partial output is the estimated y over the 2 years from the same GEE.

Please enlighten me on this. Thank you very much.


                              Year Least Squares Means

                          Standard

Year    Estimate       Error    z Value    Pr > |z|     Alpha       Lower       Upper

2013     4661.38      352.47      13.22      <.0001      0.05     3970.55     5352.22

2014     3448.69      221.80      15.55      <.0001      0.05     3013.98     3883.41

                           Differences of Year Least Squares Means

                                       Standard

Year    _Year    Estimate       Error    z Value    Pr > |z|     Alpha       Lower       Upper

2013    2014      1212.69      261.05       4.65      <.0001      0.05      701.04     1724.35

SteveDenham
Jade | Level 19

These are not unreasonable standard errors at all.  The lower and upper confidence bounds are based off the confidence intervals, and you get a range of 4 to 5.4 liters in 2013 and 3 to 3.9 liters in 2014.  These seem reasonable on all counts.

Perhaps I am misunderstanding--about what value would you expect the standard error to be?

Steve Denham

Miracle
Barite | Level 11

Hi Steve.

Thanks for your response again.

I guess I don't understand the meaning of standard error very well.

Can I please also ask if my continuous response variable have to been well normally distributed for the result to be valid?

I have read that GEE has a weaker distributional assumption hence I did not use the transformed y for GEE Nevertheless, I tried log transformation but square root transformation does a better job. But if so, another problem arises in term of understanding and interpretation of the result.


Perhaps you could shed some light on this again? Thank you very much.

SteveDenham
Jade | Level 19

One of the true advantages of GENMOD (and GLIMMIX) is the ability to specify the distribution that applies to the data (or residuals, depending on specification).  You are not restricted to a normal distribution--there are a variety to choose from.  Your choice will depend on the process (in a probabilistic sense) by which the data are generated.  If you don't have a good idea on this, an empirical approach can be be taken.

For instance, you could specify a "square root" transformation through the use of the FWDLINK and INVLINK statements:

fwdlink link = sqrt(_MEAN_);

invling ilink = (_XBETA_)*(_XBETA_);

Or through the use of programming statements, you could create your own distribution.  This, however, is not for the inexperienced.

Steve Denham


Miracle
Barite | Level 11

Hi Steve.

Thank you very much for your valuable information.

I'll go and explore it a bit more Smiley Happy

Miracle
Barite | Level 11

Hi Steve.

How are you?

I tried what you suggested by specifying the fwdlink and invlink. And the estimates change dramatically.

How would I check which GEE is more appropriate?

Thank you very much.


proc genmod data=long;

class ID year scale01;

fwdlink link = sqrt(_MEAN_);

invlink ilink = (_XBETA_)*(_XBETA_);

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*rAttitudes05 Least Squares Means
Standard
Yearscale01EstimateErrorz ValuePr > |z|AlphaLowerUpper
20131=Disagree a lot50.1811.981725.32<.00010.0546.29754.065
20132=Disagree a little56.2262.163525.99<.00010.0551.98560.466
20133=Neither agree nor disagree59.3174.162914.25<.00010.0551.15767.476
20134=Agree a little66.5942.072232.14<.00010.0562.53370.656
20135=Agree a lot98.3588.333111.8<.00010.0582.025114.69
20141=Disagree a lot47.241.980623.85<.00010.0543.35851.122
20142=Disagree a little50.9451.999225.48<.00010.0547.02654.863
20143=Neither agree nor disagree55.7996.43728.67<.00010.0543.18268.416
20144=Agree a little61.9392.239727.66<.00010.0557.54966.329
20145=Agree a lot73.9365.062114.61<.00010.0564.01483.857


SteveDenham
Jade | Level 19

Umm, I don't think they are all that different on the original scale.  Try this:

lsmean scale01*year/ilink cl;

This should up the display to include the mean on the original (untransformed) scale.

Now, so far as selecting the proper distribution.  That is art, as much as science.  Plots of data, examination of residuals, consideration of the physical processes that generate the data and interpretability all enter in.  The usual things, like comparing various information criteria, are not so useful, as they depend on the form of the data, and once things are "transformed" as in a generalized linear model, it is like comparing apples to watermelons.

One thing that might tell you how well things are fitting is the length of the confidence bounds, on the original scale.  Shorter means a more precise estimate--but again that is "art" and not rigorous, and tells you little about the accuracy of the estimation. And be sure, changing the distribution will have drastic effects on the location estimate.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1880 views
  • 3 likes
  • 2 in conversation