Re: Large standard errors from GEE

Miracle · Posted 04-21-2015 01:25 AM

Dear all,

How are you?

I'm writing to ask for help about the large standard errors from my GEE.

Is it correct to report the result given the very large standard errors?

Your insight is greatly appreciated.

Kind regards,

KC

proc genmod data=long;

class ID year scale01;

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*scale01 Least Squares Means
Year	scale01	Estimate	Standard Error	z Value	Pr > \|z\|	Alpha	Lower	Upper
2013	1=Disagree a lot	3151.32	524.47	6.01	<.0001	0.05	2123.4	4179.3
2013	2=Disagree a little	3830.37	518.49	7.39	<.0001	0.05	2814.2	4846.6
2013	3=Neither agree nor disagree	3987.09	645.05	6.18	<.0001	0.05	2722.8	5251.4
2013	4=Agree a little	4791.8	416.07	11.52	<.0001	0.05	3976.3	5607.3
2013	5=Agree a lot	10020	1706.07	5.87	<.0001	0.05	6676	13364
2014	1=Disagree a lot	2750.11	481.45	5.71	<.0001	0.05	1806.5	3693.7
2014	2=Disagree a little	3147.5	469.63	6.7	<.0001	0.05	2227.1	4068
2014	3=Neither agree nor disagree	3515.44	816.01	4.31	<.0001	0.05	1916.1	5114.8
2014	4=Agree a little	4166.6	418.75	9.95	<.0001	0.05	3345.9	4987.3
2014	5=Agree a lot	5754.81	782.82	7.35	<.0001	0.05	4220.5	7289.1

SteveDenham · Posted 04-21-2015 08:34 AM

How did you conclude that the standard errors are large? Most are an order of magnitude less than the estimate, which for almost all problems is a pretty good fit.

What is the response variable here? I notice a strong trend with increasing scale01 score. Perhaps you could look at the trend in the two years, and have greater precision for your questions.

Steve Denham

Miracle · Posted 04-21-2015 08:50 PM

Hi Steve.

Thanks for your response.

As far as I understand, the standard error is a measure of estimate precision.

I have never seen such large standard errors and therefore am very puzzled if something is wrong which I don't understand.

FYI y is the alcohol consumption(mls) in the past 6 months and I would like to know how y changes over the 2 years in relation to scale01. The below partial output is the estimated y over the 2 years from the same GEE.

Please enlighten me on this. Thank you very much.

Year Least Squares Means

Standard

Year Estimate Error z Value Pr > |z| Alpha Lower Upper

2013 4661.38 352.47 13.22 <.0001 0.05 3970.55 5352.22

2014 3448.69 221.80 15.55 <.0001 0.05 3013.98 3883.41

Differences of Year Least Squares Means

Standard

Year _Year Estimate Error z Value Pr > |z| Alpha Lower Upper

2013 2014 1212.69 261.05 4.65 <.0001 0.05 701.04 1724.35

SteveDenham · Posted 04-22-2015 01:33 PM

These are not unreasonable standard errors at all. The lower and upper confidence bounds are based off the confidence intervals, and you get a range of 4 to 5.4 liters in 2013 and 3 to 3.9 liters in 2014. These seem reasonable on all counts.

Perhaps I am misunderstanding--about what value would you expect the standard error to be?

Steve Denham

Miracle · Posted 04-23-2015 01:21 AM

Hi Steve.

Thanks for your response again.

I guess I don't understand the meaning of standard error very well.

Can I please also ask if my continuous response variable have to been well normally distributed for the result to be valid?

I have read that GEE has a weaker distributional assumption hence I did not use the transformed y for GEE Nevertheless, I tried log transformation but square root transformation does a better job. But if so, another problem arises in term of understanding and interpretation of the result.

Perhaps you could shed some light on this again? Thank you very much.

SteveDenham · Posted 04-24-2015 08:58 AM

One of the true advantages of GENMOD (and GLIMMIX) is the ability to specify the distribution that applies to the data (or residuals, depending on specification). You are not restricted to a normal distribution--there are a variety to choose from. Your choice will depend on the process (in a probabilistic sense) by which the data are generated. If you don't have a good idea on this, an empirical approach can be be taken.

For instance, you could specify a "square root" transformation through the use of the FWDLINK and INVLINK statements:

fwdlink link = sqrt(_MEAN_);

invling ilink = (_XBETA_)*(_XBETA_);

Or through the use of programming statements, you could create your own distribution. This, however, is not for the inexperienced.

Steve Denham

Miracle · Posted 04-24-2015 09:42 AM

Hi Steve.

Thank you very much for your valuable information.

I'll go and explore it a bit more

Miracle · Posted 05-04-2015 02:42 AM

Hi Steve.

How are you?

I tried what you suggested by specifying the fwdlink and invlink. And the estimates change dramatically.

How would I check which GEE is more appropriate?

Thank you very much.

proc genmod data=long;

class ID year scale01;

fwdlink link = sqrt(_MEAN_);

invlink ilink = (_XBETA_)*(_XBETA_);

model y=year scale01 scale01*year /type3;

repeated subject=ID / type=unstr covb corrw;

lsmean scale01*year / cl;

run;

Year*rAttitudes05 Least Squares Means
			Standard
Year	scale01	Estimate	Error	z Value	Pr > \|z\|	Alpha	Lower	Upper
2013	1=Disagree a lot	50.181	1.9817	25.32	<.0001	0.05	46.297	54.065
2013	2=Disagree a little	56.226	2.1635	25.99	<.0001	0.05	51.985	60.466
2013	3=Neither agree nor disagree	59.317	4.1629	14.25	<.0001	0.05	51.157	67.476
2013	4=Agree a little	66.594	2.0722	32.14	<.0001	0.05	62.533	70.656
2013	5=Agree a lot	98.358	8.3331	11.8	<.0001	0.05	82.025	114.69
2014	1=Disagree a lot	47.24	1.9806	23.85	<.0001	0.05	43.358	51.122
2014	2=Disagree a little	50.945	1.9992	25.48	<.0001	0.05	47.026	54.863
2014	3=Neither agree nor disagree	55.799	6.4372	8.67	<.0001	0.05	43.182	68.416
2014	4=Agree a little	61.939	2.2397	27.66	<.0001	0.05	57.549	66.329
2014	5=Agree a lot	73.936	5.0621	14.61	<.0001	0.05	64.014	83.857

SteveDenham · Posted 05-08-2015 10:32 AM

Umm, I don't think they are all that different on the original scale. Try this:

lsmean scale01*year/ilink cl;

This should up the display to include the mean on the original (untransformed) scale.

Now, so far as selecting the proper distribution. That is art, as much as science. Plots of data, examination of residuals, consideration of the physical processes that generate the data and interpretability all enter in. The usual things, like comparing various information criteria, are not so useful, as they depend on the form of the data, and once things are "transformed" as in a generalized linear model, it is like comparing apples to watermelons.

One thing that might tell you how well things are fitting is the length of the confidence bounds, on the original scale. Shorter means a more precise estimate--but again that is "art" and not rigorous, and tells you little about the accuracy of the estimation. And be sure, changing the distribution will have drastic effects on the location estimate.

Steve Denham

Catch up on SAS Innovate 2026