BookmarkSubscribeRSS Feed
michel
Calcite | Level 5

Hi,

I'm working with a dataset of litter depth and dry mass that, when logn (depth) or sqrt (mass) transformed has normally-distributed residuals.  I'm including a random block effect in my analysis, so I need to use PROC MIXED.

I know how to back-transform the LS mean estimates themselves, using the equation

mn2 = exp(estimate + (.5 * residual_var) )  

for log-transformed data.

I have also read that the following equation should be used to back-transform means for square-root transformed data (is this correct?):

mn2 = estimate^2 + (n-1)s^2/n


But my question is, how do I back-transform the LSMEAN standard errors, for both log- and sqrt-transformed data? I've searched all over, and can't find a clear answer to this question. Some sources even say it can't be done, yet I see it done in the literature so I know there must be a way.


Thanks in advance.


Cheers,

Nicole Michel

17 REPLIES 17
SteveDenham
Jade | Level 19

On SAS-L, I replied, and since folks don't always read both, I tried to recover it from the archive:

This looks like an opportunity to use PROC GLIMMIX, and use of the LINK option. For depth LINK=LOG, and for mass LINK=POWER(0.5). Then in the LSMEANS statement, use the ILINK option, and the final values will include the estimates and their standard errors on both the transformed and original scale. The documentation reports that the standard errors on the inverse lined scale are computed by the delta method.

I hope this helps.

Steve Denham

Message was edited by: Steve Denham

kaushal2040
Calcite | Level 5

Hi Steve,

I was wondering if distribution has to be specified in the model statement.

Proc glimmix data=data;

class sub;

model y= x1 x2 / link=power(0.5) dist=;

random sub;

run;

I know if dist is not specified, Proc Glimmix assumes it to be gaussian.  Since data is not normal in this case, how to determine the approximate distribution of the data?

SteveDenham
Jade | Level 19

No need to specify a distribution, since under the links given the residuals are normally distributed.

Steve Denham

kaushal2040
Calcite | Level 5

Hi Steve,

I apologize for the confusion.  This is form SAS documentation of Proc Glimmix

"If you do not specify a distribution, the GLIMMIX procedure defaults to the normal distribution for continuous response variables".  Does that mean normality of marginal distribution (y) or  conditional distribution (residuals)?

I contacted SAS tech representative about specifying distribution in this case and I was told to check the histogram in Proc Univariate and see if reasonable distribution can be found and specify it in the dist parameter  even I have used link function with power(-0.5) (-0.5 lambda value obtained from Box-Cox transformation).

Below is my code.

%MACRO GLMX1;

    %DO I=1  %TO 5;

         %LET VAR=%SCAN(&VARS,&I, ' ');

    PROC GLIMMIX DATA=HFD.NEW_ECO_DATA NOBOUND PLOTS=RESIDUALPANEL (CONDITIONAL MARGINAL);

      CLASS DIET DRUG RAT__ PUP__;

        MODEL &VAR=DIET DRUG DIET*DRUG/ SOLUTION DDFM=BW LINK=POWER(-0.5) E;

        RAMDOM INT/ SUB_RAT__;

        LSMEANS  DIET/CL DIFF ILINK;

        LSMEANS  DRUG/CL DIFF ILINK;

        LSMEANS  DIET*DRUG/CL DIFF ILINK;

      RUN;

%END;

%MEND GLMX1;

%GLMX1;

     Thank you very much.

I suppose it would be the distribution of residuals.  The document also mentioned Error=keyword.

DISTRIBUTION=keyword

DIST=keyword

D=keyword

ERROR=keyword

E=keyword

specifies the built-in (conditional) probability distribution of the data.


Regards,

SteveDenham
Jade | Level 19

I would use the code that you have and not specify a distribution.  Use of the link= option is equivalent to pre-transforming the data using the function specified in the link in order to normalize the residuals.  (I assume that your Box-Cox derived link was determined from the residuals).

Steve Denham

kaushal2040
Calcite | Level 5

Thanks, Steve.  I was also wondering in fit statistics of glimmix model  if the ratio  of general chi-sqaure and degree of freedom (Gener. Chi-square/DF) has to be close  to one?

SteveDenham
Jade | Level 19

It should.  What kind of values are you getting?

Steve Denham

kaushal2040
Calcite | Level 5

Hi Steve,

I have several variables; some are close to 1 and some are not (such as 50).   What could be the reason for gettting such  high values  even after transforming to induce normality in residuals?

Thanks !!!

SteveDenham
Jade | Level 19

Probably an unaccounted for source of variability--any possibility of cage-level or room-level effects?  Systematic unidirectional "outliers" can also have this effect.

And then, it may be that even after fitting Box-Cox to the residuals, the basic model is missing something that I am not seeing right away.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

For normal data (or any distribution with a free scale parameter),  Gen. chi-squared/df does not need to be 1. It can be any value. For simple situations (variance component models), this statistic is the same as the residual variance.

SteveDenham
Jade | Level 19

I am going to stand over in the corner for a while longer, and do some studying.  I should have known this, and I didn't.  Thanks, Larry.

Steve Denham

kaushal2040
Calcite | Level 5

Hello Ivm,

This paper mentions Gen. chi-squared/df greater than 1 means overdispersion for binomial distribution.

http://www2.sas.com/proceedings/sugi30/196-30.pdf

Does greater than 1 means overdispersion in this case or something else?

Thanks !!!

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Your distribution is normal, which is in the exponential family. But the binomial and Poisson do not have a free scale (variance) parameter. The normal, gamma, beta, and others do have a scale parameter. This is a HUGE difference, for many reasons. Find the several articles/books by Walter Stroup. Overdispersion is a concept only for distributions without a free scale parameter. My earlier response is correct.

kaushal2040
Calcite | Level 5

Thanks Ivm and Steve.  This forum is really helpful.

Regards,

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 11548 views
  • 2 likes
  • 6 in conversation