01-19-2012 06:26 PM
I'm working with a dataset of litter depth and dry mass that, when logn (depth) or sqrt (mass) transformed has normally-distributed residuals. I'm including a random block effect in my analysis, so I need to use PROC MIXED.
I know how to back-transform the LS mean estimates themselves, using the equation
mn2 = exp(estimate + (.5 * residual_var) )
for log-transformed data.
I have also read that the following equation should be used to back-transform means for square-root transformed data (is this correct?):
mn2 = estimate^2 + (n-1)s^2/n
But my question is, how do I back-transform the LSMEAN standard errors, for both log- and sqrt-transformed data? I've searched all over, and can't find a clear answer to this question. Some sources even say it can't be done, yet I see it done in the literature so I know there must be a way.
Thanks in advance.
01-20-2012 07:55 AM
On SAS-L, I replied, and since folks don't always read both, I tried to recover it from the archive:
This looks like an opportunity to use PROC GLIMMIX, and use of the LINK option. For depth LINK=LOG, and for mass LINK=POWER(0.5). Then in the LSMEANS statement, use the ILINK option, and the final values will include the estimates and their standard errors on both the transformed and original scale. The documentation reports that the standard errors on the inverse lined scale are computed by the delta method.
I hope this helps.
Message was edited by: Steve Denham
10-14-2014 02:59 PM
I was wondering if distribution has to be specified in the model statement.
Proc glimmix data=data;
model y= x1 x2 / link=power(0.5) dist=;
I know if dist is not specified, Proc Glimmix assumes it to be gaussian. Since data is not normal in this case, how to determine the approximate distribution of the data?
10-15-2014 09:52 AM
I apologize for the confusion. This is form SAS documentation of Proc Glimmix
"If you do not specify a distribution, the GLIMMIX procedure defaults to the normal distribution for continuous response variables". Does that mean normality of marginal distribution (y) or conditional distribution (residuals)?
I contacted SAS tech representative about specifying distribution in this case and I was told to check the histogram in Proc Univariate and see if reasonable distribution can be found and specify it in the dist parameter even I have used link function with power(-0.5) (-0.5 lambda value obtained from Box-Cox transformation).
Below is my code.
%DO I=1 %TO 5;
%LET VAR=%SCAN(&VARS,&I, ' ');
PROC GLIMMIX DATA=HFD.NEW_ECO_DATA NOBOUND PLOTS=RESIDUALPANEL (CONDITIONAL MARGINAL);
CLASS DIET DRUG RAT__ PUP__;
MODEL &VAR=DIET DRUG DIET*DRUG/ SOLUTION DDFM=BW LINK=POWER(-0.5) E;
RAMDOM INT/ SUB_RAT__;
LSMEANS DIET/CL DIFF ILINK;
LSMEANS DRUG/CL DIFF ILINK;
LSMEANS DIET*DRUG/CL DIFF ILINK;
Thank you very much.
I suppose it would be the distribution of residuals. The document also mentioned Error=keyword.
10-15-2014 02:30 PM
I would use the code that you have and not specify a distribution. Use of the link= option is equivalent to pre-transforming the data using the function specified in the link in order to normalize the residuals. (I assume that your Box-Cox derived link was determined from the residuals).
10-15-2014 06:13 PM
Thanks, Steve. I was also wondering in fit statistics of glimmix model if the ratio of general chi-sqaure and degree of freedom (Gener. Chi-square/DF) has to be close to one?
10-16-2014 01:12 PM
I have several variables; some are close to 1 and some are not (such as 50). What could be the reason for gettting such high values even after transforming to induce normality in residuals?
10-17-2014 12:51 PM
Probably an unaccounted for source of variability--any possibility of cage-level or room-level effects? Systematic unidirectional "outliers" can also have this effect.
And then, it may be that even after fitting Box-Cox to the residuals, the basic model is missing something that I am not seeing right away.
10-17-2014 02:54 PM
For normal data (or any distribution with a free scale parameter), Gen. chi-squared/df does not need to be 1. It can be any value. For simple situations (variance component models), this statistic is the same as the residual variance.
10-17-2014 02:57 PM
I am going to stand over in the corner for a while longer, and do some studying. I should have known this, and I didn't. Thanks, Larry.
10-17-2014 04:24 PM
This paper mentions Gen. chi-squared/df greater than 1 means overdispersion for binomial distribution.
Does greater than 1 means overdispersion in this case or something else?
10-17-2014 04:33 PM
Your distribution is normal, which is in the exponential family. But the binomial and Poisson do not have a free scale (variance) parameter. The normal, gamma, beta, and others do have a scale parameter. This is a HUGE difference, for many reasons. Find the several articles/books by Walter Stroup. Overdispersion is a concept only for distributions without a free scale parameter. My earlier response is correct.