03-18-2014 02:27 PM
I want to get the adjusted mean for two groups after controlling for a continuous confounder. In its simplest form, the model is specified along these lines:
cost = group + confounder
I am inclined to fit this model with dist=gamma since the cost (>0) distribution follows much closer a gamma than a normal distribution. I compare two GENMOD models
1) dist=gamma; link=log
2) dist=normal; link=identity
As expected, with no confounder I get the exact arithmetic mean for each group with both models (after transforming back the log link in the gamma model). With the confounder in, I get "logical" adjusted means with the normal model but "biased" low estimates for both groups with the gamma model. Below are the actual (arithmetic) means and the adjusted means with each method:
|Group||N||Confounder||Cost UnAdj||Cost Adj Normal||Cost Adj Gamma||ls-means gamma|
So what method should I follow and why? I feel inclined to use the "normal" adjustment in my results, but what is the justification to use regular "normal" model rather than a gamma model when the cost distribution follows gamma?
Thanks much for your help
03-19-2014 01:29 PM
If there is any biological reason for choosing one model instead of the other then I will choose the biological meaningfull model.
Otherwise, I will choose a model which fit the data, and which can produce give meaningfull estimates. I will therefore recommend to do some assesment of the model fit, which is easy done with the assessment statement in proc genmod. Especially, you are interested in assessing the linkfunction, therefore you can add something like this line after the model-statement in the genmod procedure:
ASSESS LINK / nsample=50 nsim=1;
There are excellent examples in the sas-documentation of how to interpretate the assesment graphs. You can turn up the number of simulations (nsim and nsample), but start with some small numbers.
Unfortunately, the calcuation-time of the assess-statement is O(n^2), so your number of observation must not be very high. If the procedure is still running after an hour, then interrupt the procedure and forget my suggestion until the sas-programmers optimize the assessment method.
03-21-2014 03:08 PM
Thanks Jacob and Pronabesh for your responses.
My only reason to choose one model over the other is this: I want to get the "exact" adjusted means controlling by a set of confounders. For this, model fit is of no importance. Let me explain.
If we run the model cost=group, i.e., without confounders, we get the exact arithmetic means for each group even when, being "group" non-significant, model fit is R-sq=0.00031 with AIC and BIC almost as high as with the null model. This same comment applies for adjusted LS-MEANS when including confounders, where model fit might be poor but we get "exact" LS-MEANS.
Based on this, I know that I want to choose the normal model to show my adjusted means, because the gamma model provides me with something "different". And that's what I want to know: what is the gamma model providing me with? Why adjusted means (LS-MEANS) that I get with a gamma model do not preserve the overall arithmetic mean of the sample? Why is it not advisable to get adjusted (arithmetic) means with models other than normal?
(Thanks Jacob for your comment on the ASSESMENT option in GENMOD, I will look into it for my own enjoyment even if it doesn't help here)
03-24-2014 09:41 AM
Look at the expected value of a gamma distributed variable (Wikipedia has this). You will see that it is NOT the arithmetic average as for the normal distribution. Thus, there is no reason to expect that the mean (arithmetic average) and the lsmean (best linear unbiased estimate) should be the same. In fact, the arithmetic average will be consistently biased above the expected value, and will not be a good predictor of future outcomes.
03-24-2014 06:00 PM
Thanks much for your response Steve. That's what I am trying to explain to my audience. As you say, the expected value of the Gamma distribution is the product of shape and scale parameters, which is different from the arithmetic mean. So my conclusion is that we don't get adjusted arithmetic means with (the family of) exponential distributions and link functions.
Still, the overall arithmetic mean and the LS-MEAN for a null gamma model are the same. It's only when covariates are included when they differ.
Of course, a different topic is whether the arithmetic mean is the proper statistic to report, but that's unfortunately hard to change in some instances.
03-25-2014 08:39 AM
I am amazed that the arithmetic mean and the lsmean are the same. On the original scale, the lsmean should be much closer to the geometric mean.
Inclusion of covariates will move lsmeans to different values from raw means, no matter the distribution, unless the number of observations are perfectly balanced (categorical covariates in the CLASS statement) or all observations have identical values for continuous covariates. Lsmeans are best linear unbiased estimates, so this isn't surprising.
03-25-2014 11:36 PM
"I am amazed that the arithmetic mean and the lsmean are the same. On the original scale, the lsmean should be much closer to the geometric mean".
I agree, but that's the case when modeling a gamma distribution. When we compare these two models below, the LS-MEANS are the same (after transforming the log link).
proc genmod data=test;
model y=d / dist=normal link=identity;
proc genmod data=test;
model y=d / dist=gamma link=log;
I also agree with your second statement as well, which is a brief nice explanation of the LS-MEANS inner works.
Thanks again for your response.