turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Large standard errors from GEE

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 01:25 AM

Dear all,

How are you?

I'm writing to ask for help about the large standard errors from my GEE.

Is it correct to report the result given the very large standard errors?

Your insight is greatly appreciated.

Kind regards,

KC

proc genmod data=long;

class ID year scale01;

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*scale01 Least Squares Means | ||||||||

Year | scale01 | Estimate | Standard Error | z Value | Pr > |z| | Alpha | Lower | Upper |

2013 | 1=Disagree a lot | 3151.32 | 524.47 | 6.01 | <.0001 | 0.05 | 2123.4 | 4179.3 |

2013 | 2=Disagree a little | 3830.37 | 518.49 | 7.39 | <.0001 | 0.05 | 2814.2 | 4846.6 |

2013 | 3=Neither agree nor disagree | 3987.09 | 645.05 | 6.18 | <.0001 | 0.05 | 2722.8 | 5251.4 |

2013 | 4=Agree a little | 4791.8 | 416.07 | 11.52 | <.0001 | 0.05 | 3976.3 | 5607.3 |

2013 | 5=Agree a lot | 10020 | 1706.07 | 5.87 | <.0001 | 0.05 | 6676 | 13364 |

2014 | 1=Disagree a lot | 2750.11 | 481.45 | 5.71 | <.0001 | 0.05 | 1806.5 | 3693.7 |

2014 | 2=Disagree a little | 3147.5 | 469.63 | 6.7 | <.0001 | 0.05 | 2227.1 | 4068 |

2014 | 3=Neither agree nor disagree | 3515.44 | 816.01 | 4.31 | <.0001 | 0.05 | 1916.1 | 5114.8 |

2014 | 4=Agree a little | 4166.6 | 418.75 | 9.95 | <.0001 | 0.05 | 3345.9 | 4987.3 |

2014 | 5=Agree a lot | 5754.81 | 782.82 | 7.35 | <.0001 | 0.05 | 4220.5 | 7289.1 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 08:34 AM

How did you conclude that the standard errors are large? Most are an order of magnitude less than the estimate, which for almost all problems is a pretty good fit.

What is the response variable here? I notice a strong trend with increasing scale01 score. Perhaps you could look at the trend in the two years, and have greater precision for your questions.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 08:50 PM

Hi Steve.

Thanks for your response.

As far as I understand, the standard error is a measure of estimate precision.

I have never seen such large standard errors and therefore am very puzzled if something is wrong which I don't understand.

FYI y is the alcohol consumption(mls) in the past 6 months and I would like to know how y changes over the 2 years in relation to scale01. The below partial output is the estimated y over the 2 years from the same GEE.

Please enlighten me on this. Thank you very much.

Year Least Squares Means

Standard

Year Estimate Error z Value Pr > |z| Alpha Lower Upper

2013 4661.38 352.47 13.22 <.0001 0.05 3970.55 5352.22

2014 3448.69 221.80 15.55 <.0001 0.05 3013.98 3883.41

Differences of Year Least Squares Means

Standard

Year _Year Estimate Error z Value Pr > |z| Alpha Lower Upper

2013 2014 1212.69 261.05 4.65 <.0001 0.05 701.04 1724.35

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 01:33 PM

These are not unreasonable standard errors at all. The lower and upper confidence bounds are based off the confidence intervals, and you get a range of 4 to 5.4 liters in 2013 and 3 to 3.9 liters in 2014. These seem reasonable on all counts.

Perhaps I am misunderstanding--about what value would you expect the standard error to be?

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-23-2015 01:21 AM

Hi Steve.

Thanks for your response again.

I guess I don't understand the meaning of standard error very well.

Can I please also ask if my continuous response variable have to been well normally distributed for the result to be valid?

I have read that GEE has a weaker distributional assumption hence I did not use the transformed y for GEE Nevertheless, I tried log transformation but square root transformation does a better job. But if so, another problem arises in term of understanding and interpretation of the result.

Perhaps you could shed some light on this again? Thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-24-2015 08:58 AM

One of the true advantages of GENMOD (and GLIMMIX) is the ability to specify the distribution that applies to the data (or residuals, depending on specification). You are not restricted to a normal distribution--there are a variety to choose from. Your choice will depend on the process (in a probabilistic sense) by which the data are generated. If you don't have a good idea on this, an empirical approach can be be taken.

For instance, you could specify a "square root" transformation through the use of the FWDLINK and INVLINK statements:

fwdlink link = sqrt(_MEAN_);

invling ilink = (_XBETA_)*(_XBETA_);

Or through the use of programming statements, you could create your own distribution. This, however, is not for the inexperienced.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-24-2015 09:42 AM

Hi Steve.

Thank you very much for your valuable information.

I'll go and explore it a bit more

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2015 02:42 AM

Hi Steve.

How are you?

I tried what you suggested by specifying the fwdlink and invlink. And the estimates change dramatically.

How would I check which GEE is more appropriate?

Thank you very much.

proc genmod data=long;

class ID year scale01;

fwdlink link = sqrt(_MEAN_);

invlink ilink = (_XBETA_)*(_XBETA_);

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*rAttitudes05 Least Squares Means | ||||||||

Standard | ||||||||

Year | scale01 | Estimate | Error | z Value | Pr > |z| | Alpha | Lower | Upper |

2013 | 1=Disagree a lot | 50.181 | 1.9817 | 25.32 | <.0001 | 0.05 | 46.297 | 54.065 |

2013 | 2=Disagree a little | 56.226 | 2.1635 | 25.99 | <.0001 | 0.05 | 51.985 | 60.466 |

2013 | 3=Neither agree nor disagree | 59.317 | 4.1629 | 14.25 | <.0001 | 0.05 | 51.157 | 67.476 |

2013 | 4=Agree a little | 66.594 | 2.0722 | 32.14 | <.0001 | 0.05 | 62.533 | 70.656 |

2013 | 5=Agree a lot | 98.358 | 8.3331 | 11.8 | <.0001 | 0.05 | 82.025 | 114.69 |

2014 | 1=Disagree a lot | 47.24 | 1.9806 | 23.85 | <.0001 | 0.05 | 43.358 | 51.122 |

2014 | 2=Disagree a little | 50.945 | 1.9992 | 25.48 | <.0001 | 0.05 | 47.026 | 54.863 |

2014 | 3=Neither agree nor disagree | 55.799 | 6.4372 | 8.67 | <.0001 | 0.05 | 43.182 | 68.416 |

2014 | 4=Agree a little | 61.939 | 2.2397 | 27.66 | <.0001 | 0.05 | 57.549 | 66.329 |

2014 | 5=Agree a lot | 73.936 | 5.0621 | 14.61 | <.0001 | 0.05 | 64.014 | 83.857 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-08-2015 10:32 AM

Umm, I don't think they are all that different on the original scale. Try this:

lsmean scale01*year/**ilink **cl;

This should up the display to include the mean on the original (untransformed) scale.

Now, so far as selecting the proper distribution. That is art, as much as science. Plots of data, examination of residuals, consideration of the physical processes that generate the data and interpretability all enter in. The usual things, like comparing various information criteria, are not so useful, as they depend on the form of the data, and once things are "transformed" as in a generalized linear model, it is like comparing apples to watermelons.

One thing that might tell you how well things are fitting is the length of the confidence bounds, on the original scale. Shorter means a more precise estimate--but again that is "art" and not rigorous, and tells you little about the accuracy of the estimation. And be sure, changing the distribution will have drastic effects on the location estimate.

Steve Denham