BookmarkSubscribeRSS Feed
JamesLin
Fluorite | Level 6

I am conducting a two-sample test (1-way ANOVA with 2 treatments), and the goal is to estimate the ratio of cell means assuming that the data are lognormal. A simple approach is to log the response and fit a model

log(Y) = b0 + b1 * X

and then estimate the ratio as

R = exp(b1).

However, that gives the ratio of geometric cell means rather than arithmetic cell means.

I assumed that if I fit a "proper" lognormal model using either gamlss in R or PROC GLIMMIX in SAS, I will get the ratio of arithmetic means, but for some reason both procedures generate the same slope as the log(Y) regression.

This is odd because when I use this approach with Poisson or Negative Binomial regression, I do get the ratio of arithmetic means. What am I missing?

Thanks, James

7 REPLIES 7
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The lognormal is a bit of an "odd duck" in terms of distributions. You are basically saying that log(Y) is normal. As stated in the GLIMMIX User's Guide, the distribution fitted is "not the distribution of Y". Thus, the antilog is not the mean of Y, but is related to the mean of Y. You can get the required ratio of means using the normal distribution with a log link.

proc glimmix data=b;

class trt; *two levels;

model y = trt / s dist=normal link=log;

lsmeans trt / cl diff ilink ;

run;

exp(trt1) will give you the ratio you are looking for. And, the exp(mu1-mu2) will give the same thing.

JamesLin
Fluorite | Level 6

I assume Normal with log link means that Y ~ N (mu, sigma) where mu = exp(x'b). That is, while the mean response is guaranteed to be positive, this distribution can still generate negative observations. It doesn't make much sense because my observations are always positive.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You asked about getting the ratio of the two means by using exp(b), and I showed you how to do it in GLIMMIX (works in GENMOD also). It will always work for the point estimate and positive means. But I did not say you should be doing this. As Steve wrote, you will have to use post-model fitting in a data step to get the means on the original scale if you choose log-normal for your distribution.Those means are not obtainable in the output.

SteveDenham
Jade | Level 19

I guess my first question would be: If the data are lognormally distributed, why would you want a ratio of the arithmetic means, knowing that the arithmetic means are biased?  The ratio of geometric means is at least something closer.  Note that the expected values and variances are not obtained by a simple exponentiation, and so a ratio of expected values is going to involve a few lines of data step programming.  See the documentation for the DIST= option of the MODEL statement, and search down below the table for the paragraphs on the lognormal distribution, where equations for the expected value and variance are given.

Steve Denham

JamesLin
Fluorite | Level 6

I don't understand what you mean by "biased". My goal is to get a ratio of two expected responses, i.e.


E[Y | trt = 2] / E[Y | trt = 1]

Correspondingly, an unbiased estimator of E[Y | trt = x] is an arithmetic average of responses under treatment x.

I found those formulas in SAS manual, but it doesn't make sense. The two sample test is equivalent to

log(Y1) ~ N(mu1, sigma2)

log(Y2) ~ N(mu2, sigma2)

So E[Y2] / E[Y1] = exp(mu2 - mu1)  because the sigma2 term cancels out, right?

SteveDenham
Jade | Level 19

Regarding:

Correspondingly, an unbiased estimator of E[Y | trt = x] is an arithmetic average of responses under treatment x

This is only true for certain distributions, and certainly is not the case for distributions such as lognormal, poisson, negative binomial, gamma and several others.  If it were true, there would never have been much need to develop generalized linear models.

Steve Denham

JamesLin
Fluorite | Level 6

Steve:

To make it clear, I placed the formulas in this post:

r - Estimating the ratio of cell means in ANOVA under lognormal assumption - Cross Validated

The problem is that I managed to deduce that the exp(b1) should be estimated as the ratio of arithmetic cell means, but, on the other hand,

it should be estimated as the ratio of geometric cell means. Apparently, it's impossible, and I need to know where I made a mistake.

Regards,

James

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1946 views
  • 6 likes
  • 3 in conversation