Hi,
I'm working with a dataset of litter depth and dry mass that, when logn (depth) or sqrt (mass) transformed has normally-distributed residuals. I'm including a random block effect in my analysis, so I need to use PROC MIXED.
I know how to back-transform the LS mean estimates themselves, using the equation
mn2 = exp(estimate + (.5 * residual_var) )
for log-transformed data.
I have also read that the following equation should be used to back-transform means for square-root transformed data (is this correct?):
mn2 = estimate^2 + (n-1)s^2/n
But my question is, how do I back-transform the LSMEAN standard errors, for both log- and sqrt-transformed data? I've searched all over, and can't find a clear answer to this question. Some sources even say it can't be done, yet I see it done in the literature so I know there must be a way.
Thanks in advance.
Cheers,
Nicole Michel
On SAS-L, I replied, and since folks don't always read both, I tried to recover it from the archive:
This looks like an opportunity to use PROC GLIMMIX, and use of the LINK option. For depth LINK=LOG, and for mass LINK=POWER(0.5). Then in the LSMEANS statement, use the ILINK option, and the final values will include the estimates and their standard errors on both the transformed and original scale. The documentation reports that the standard errors on the inverse lined scale are computed by the delta method.
I hope this helps.
Steve Denham
Message was edited by: Steve Denham
Hi Steve,
I was wondering if distribution has to be specified in the model statement.
Proc glimmix data=data;
class sub;
model y= x1 x2 / link=power(0.5) dist=;
random sub;
run;
I know if dist is not specified, Proc Glimmix assumes it to be gaussian. Since data is not normal in this case, how to determine the approximate distribution of the data?
No need to specify a distribution, since under the links given the residuals are normally distributed.
Steve Denham
Hi Steve,
I apologize for the confusion. This is form SAS documentation of Proc Glimmix
"If you do not specify a distribution, the GLIMMIX procedure defaults to the normal distribution for continuous response variables". Does that mean normality of marginal distribution (y) or conditional distribution (residuals)?
I contacted SAS tech representative about specifying distribution in this case and I was told to check the histogram in Proc Univariate and see if reasonable distribution can be found and specify it in the dist parameter even I have used link function with power(-0.5) (-0.5 lambda value obtained from Box-Cox transformation).
Below is my code.
%MACRO GLMX1;
%DO I=1 %TO 5;
%LET VAR=%SCAN(&VARS,&I, ' ');
PROC GLIMMIX DATA=HFD.NEW_ECO_DATA NOBOUND PLOTS=RESIDUALPANEL (CONDITIONAL MARGINAL);
CLASS DIET DRUG RAT__ PUP__;
MODEL &VAR=DIET DRUG DIET*DRUG/ SOLUTION DDFM=BW LINK=POWER(-0.5) E;
RAMDOM INT/ SUB_RAT__;
LSMEANS DIET/CL DIFF ILINK;
LSMEANS DRUG/CL DIFF ILINK;
LSMEANS DIET*DRUG/CL DIFF ILINK;
RUN;
%END;
%MEND GLMX1;
%GLMX1;
Thank you very much.
I suppose it would be the distribution of residuals. The document also mentioned Error=keyword.
DISTRIBUTION=keyword
DIST=keyword
D=keyword
ERROR=keyword
E=keyword
specifies the built-in (conditional) probability distribution of the data.
Regards,
I would use the code that you have and not specify a distribution. Use of the link= option is equivalent to pre-transforming the data using the function specified in the link in order to normalize the residuals. (I assume that your Box-Cox derived link was determined from the residuals).
Steve Denham
Thanks, Steve. I was also wondering in fit statistics of glimmix model if the ratio of general chi-sqaure and degree of freedom (Gener. Chi-square/DF) has to be close to one?
It should. What kind of values are you getting?
Steve Denham
Hi Steve,
I have several variables; some are close to 1 and some are not (such as 50). What could be the reason for gettting such high values even after transforming to induce normality in residuals?
Thanks !!!
Probably an unaccounted for source of variability--any possibility of cage-level or room-level effects? Systematic unidirectional "outliers" can also have this effect.
And then, it may be that even after fitting Box-Cox to the residuals, the basic model is missing something that I am not seeing right away.
Steve Denham
For normal data (or any distribution with a free scale parameter), Gen. chi-squared/df does not need to be 1. It can be any value. For simple situations (variance component models), this statistic is the same as the residual variance.
I am going to stand over in the corner for a while longer, and do some studying. I should have known this, and I didn't. Thanks, Larry.
Steve Denham
Hello Ivm,
This paper mentions Gen. chi-squared/df greater than 1 means overdispersion for binomial distribution.
http://www2.sas.com/proceedings/sugi30/196-30.pdf
Does greater than 1 means overdispersion in this case or something else?
Thanks !!!
Your distribution is normal, which is in the exponential family. But the binomial and Poisson do not have a free scale (variance) parameter. The normal, gamma, beta, and others do have a scale parameter. This is a HUGE difference, for many reasons. Find the several articles/books by Walter Stroup. Overdispersion is a concept only for distributions without a free scale parameter. My earlier response is correct.
Thanks Ivm and Steve. This forum is really helpful.
Regards,
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.