08-14-2014 02:07 PM
I am conducting a two-sample test (1-way ANOVA with 2 treatments), and the goal is to estimate the ratio of cell means assuming that the data are lognormal. A simple approach is to log the response and fit a model
log(Y) = b0 + b1 * X
and then estimate the ratio as
R = exp(b1).
However, that gives the ratio of geometric cell means rather than arithmetic cell means.
I assumed that if I fit a "proper" lognormal model using either gamlss in R or PROC GLIMMIX in SAS, I will get the ratio of arithmetic means, but for some reason both procedures generate the same slope as the log(Y) regression.
This is odd because when I use this approach with Poisson or Negative Binomial regression, I do get the ratio of arithmetic means. What am I missing?
08-14-2014 03:16 PM
The lognormal is a bit of an "odd duck" in terms of distributions. You are basically saying that log(Y) is normal. As stated in the GLIMMIX User's Guide, the distribution fitted is "not the distribution of Y". Thus, the antilog is not the mean of Y, but is related to the mean of Y. You can get the required ratio of means using the normal distribution with a log link.
proc glimmix data=b;
class trt; *two levels;
model y = trt / s dist=normal link=log;
lsmeans trt / cl diff ilink ;
exp(trt1) will give you the ratio you are looking for. And, the exp(mu1-mu2) will give the same thing.
08-14-2014 03:37 PM
I assume Normal with log link means that Y ~ N (mu, sigma) where mu = exp(x'b). That is, while the mean response is guaranteed to be positive, this distribution can still generate negative observations. It doesn't make much sense because my observations are always positive.
08-14-2014 03:48 PM
You asked about getting the ratio of the two means by using exp(b), and I showed you how to do it in GLIMMIX (works in GENMOD also). It will always work for the point estimate and positive means. But I did not say you should be doing this. As Steve wrote, you will have to use post-model fitting in a data step to get the means on the original scale if you choose log-normal for your distribution.Those means are not obtainable in the output.
08-14-2014 03:22 PM
I guess my first question would be: If the data are lognormally distributed, why would you want a ratio of the arithmetic means, knowing that the arithmetic means are biased? The ratio of geometric means is at least something closer. Note that the expected values and variances are not obtained by a simple exponentiation, and so a ratio of expected values is going to involve a few lines of data step programming. See the documentation for the DIST= option of the MODEL statement, and search down below the table for the paragraphs on the lognormal distribution, where equations for the expected value and variance are given.
08-14-2014 04:05 PM
I don't understand what you mean by "biased". My goal is to get a ratio of two expected responses, i.e.
E[Y | trt = 2] / E[Y | trt = 1]
Correspondingly, an unbiased estimator of E[Y | trt = x] is an arithmetic average of responses under treatment x.
I found those formulas in SAS manual, but it doesn't make sense. The two sample test is equivalent to
log(Y1) ~ N(mu1, sigma2)
log(Y2) ~ N(mu2, sigma2)
So E[Y2] / E[Y1] = exp(mu2 - mu1) because the sigma2 term cancels out, right?
08-15-2014 07:45 AM
Correspondingly, an unbiased estimator of E[Y | trt = x] is an arithmetic average of responses under treatment x
This is only true for certain distributions, and certainly is not the case for distributions such as lognormal, poisson, negative binomial, gamma and several others. If it were true, there would never have been much need to develop generalized linear models.
08-16-2014 02:43 PM
To make it clear, I placed the formulas in this post:
The problem is that I managed to deduce that the exp(b1) should be estimated as the ratio of arithmetic cell means, but, on the other hand,
it should be estimated as the ratio of geometric cell means. Apparently, it's impossible, and I need to know where I made a mistake.