BookmarkSubscribeRSS Feed
JavierAbalo
Calcite | Level 5

I want to get the adjusted mean for two groups after controlling for a continuous confounder. In its simplest form, the model is specified along these lines:

cost = group + confounder

I am inclined to fit this model with dist=gamma since the cost (>0) distribution follows much closer a gamma than a normal distribution. I compare two GENMOD models

1) dist=gamma; link=log

2) dist=normal; link=identity

As expected, with no confounder I get the exact arithmetic mean for each group with both models (after transforming back the log link in the gamma model). With the confounder in, I get "logical" adjusted means with the normal model but "biased" low estimates for both groups with the gamma model. Below are the actual (arithmetic) means and the adjusted means with each method:

GroupNConfounderCost UnAdjCost Adj NormalCost Adj Gammals-means gamma
Group 11822611.48846.48757.55538.68exp(6.2854)
Group 21520810.43935.931042.51708.96exp(6.5638)

So what method should I follow and why? I feel inclined to use the "normal" adjustment in my results, but what is the justification to use regular "normal" model rather than a gamma model when the cost distribution follows gamma?

Thanks much for your help

7 REPLIES 7
JacobSimonsen
Barite | Level 11

If there is any biological reason for choosing one model instead of the other then I will choose the biological meaningfull model.

Otherwise, I will choose a model which fit the data, and which can produce give meaningfull estimates. I will therefore recommend to do some assesment of the model fit, which is easy done with the assessment statement in proc genmod. Especially, you are interested in assessing the linkfunction, therefore you can add something like this line after the model-statement in the genmod procedure:

ASSESS  LINK / nsample=50  nsim=1;

There are excellent examples in the sas-documentation of how to interpretate the assesment graphs. You can turn up the number of simulations (nsim and nsample), but start with some small numbers.

Unfortunately, the calcuation-time of the assess-statement is O(n^2), so your number of observation must not be very high. If the procedure is still running after an hour, then interrupt the procedure and forget my suggestion until the sas-programmers optimize the assessment method.

Jacob

pronabesh
Fluorite | Level 6

I believe you can also test the model fit by looking at AIC, BIC etc (lower is better)

JavierAbalo
Calcite | Level 5

Thanks Jacob and Pronabesh for your responses.

My only reason to choose one model over the other is this: I want to get the "exact" adjusted means controlling by a set of confounders. For this, model fit is of no importance. Let me explain.

If we run the model cost=group, i.e., without confounders, we get the exact arithmetic means for each group even when, being "group" non-significant, model fit is R-sq=0.00031 with AIC and BIC almost as high as with the null model. This same comment applies for adjusted LS-MEANS when including confounders, where model fit might be poor but we get "exact" LS-MEANS.

Based on this, I know that I want to choose the normal model to show my adjusted means, because the gamma model provides me with something "different". And that's what I want to know: what is the gamma model providing me with? Why adjusted means (LS-MEANS) that I get with a gamma model do not preserve the overall arithmetic mean of the sample? Why is it not advisable to get adjusted (arithmetic) means with models other than normal?

(Thanks Jacob for your comment on the ASSESMENT option in GENMOD, I will look into it for my own enjoyment even if it doesn't help here)

SteveDenham
Jade | Level 19

Look at the expected value of a gamma distributed variable (Wikipedia has this).  You will see that it is NOT the arithmetic average as for the normal distribution.  Thus, there is no reason to expect that the mean (arithmetic average) and the lsmean (best linear unbiased estimate) should be the same.  In fact, the arithmetic average will be consistently biased above the expected value, and will not be a good predictor of future outcomes.

Steve Denham

JavierAbalo
Calcite | Level 5

Thanks much for your response Steve. That's what I am trying to explain to my audience. As you say, the expected value of the Gamma distribution is the product of shape and scale parameters, which is different from the arithmetic mean. So my conclusion is that we don't get adjusted arithmetic means with (the family of) exponential distributions and link functions.

Still, the overall arithmetic mean and the LS-MEAN for a null gamma model are the same. It's only when covariates are included when they differ.

Of course, a different topic is whether the arithmetic mean is the proper statistic to report, but that's unfortunately hard to change in some instances.

SteveDenham
Jade | Level 19

I am amazed that the arithmetic mean and the lsmean are the same.  On the original scale, the lsmean should be much closer to the geometric mean.

Inclusion of covariates will move lsmeans to different values from raw means, no matter the distribution, unless the number of observations are perfectly balanced (categorical covariates in the CLASS statement) or all observations have identical values for continuous covariates.  Lsmeans are best linear unbiased estimates, so this isn't surprising.

Steve Denham

JavierAbalo
Calcite | Level 5

"I am amazed that the arithmetic mean and the lsmean are the same. On the original scale, the lsmean should be much closer to the geometric mean".

I agree, but that's the case when modeling a gamma distribution. When we compare these two models below, the LS-MEANS are the same (after transforming the log link).

proc genmod data=test;

model y=d / dist=normal link=identity;

lsmeans d;

run;

proc genmod data=test;

model y=d / dist=gamma link=log;

lsmeans d;

run;

I also agree with your second statement as well, which is a brief nice explanation of the LS-MEANS inner works.

Thanks again for your response.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2169 views
  • 4 likes
  • 4 in conversation