BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uzma03505621
Obsidian | Level 7

Hi everyone, I have a quick question about the generalized linear model with gamma distribution and log link function. 

For my research, my study sample includes patients with hypertension and they are categorized as below based on depression status:

1)Hypertension+ Treated depression (n=1150)

2)Hypertension+Untreated depression (n=702)

3)Hypertension+ No depression (n=5229)

 I have to estimate the differences in the healthcare expenditures (cost) in these groups. I have used the following syntax.

 

proc genmod data=studydata;

class depressioncategory age gender insurancestatus;

model cost= depressioncategory age gender insurancestatus income povertylevel/ dist=gamma link=log;

output out= data2  pred=phat;

run;

 

Now from this output, how can I interpret the differences in the costs in my three categories? Please guide me.

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

They are the values in the Mean column that you highlighted.

View solution in original post

12 REPLIES 12
StatDave
SAS Super FREQ

You can use the NLMeans macro to estimate the differences in means among the categories. See the Results tab for an example that involves a gamma model. There also several other links to other examples.

uzma03505621
Obsidian | Level 7

okay,  I will try that and thanks for the quick response 🙂

uzma03505621
Obsidian | Level 7

But in this model some of my independent variables are categorical and some are continuous. Will that make any difference to my output interpretation?

uzma03505621
Obsidian | Level 7

Also, out of the total 7081 patients, I have 24 patients with ZERO healthcare costs.

StatDave
SAS Super FREQ
Zero is not a valid value for a gamma distribution. For data that contains zeros, you can use the Tweedie distribution which is also available in GENMOD (DIST=TWEEDIE). You can fit that model and use the NLMeans macro in the same way as in the example in the macro documentation.
uzma03505621
Obsidian | Level 7

I found this as a revised syntax for gamma distribution which will take into consideration the ZERO observations also. Is it correct?

 

PROC GENMOD;
A = _MEAN_;
B = _RESP_;
D = B/A + LOG(A)
VARIANCE VAR = A**2
DEVIANCE DEV = D;
MODEL COST=X1 X2 X3 / LINK=LOG;

StatDave
SAS Super FREQ

That is neither correct nor incorrect, it is just a different distribution. That code uses an alteration of the gamma deviance which removes the part of it that excludes nonpositive values. With this deviance definition, zero and even negative values are allowed. It's up to you to see if the resulting model suits your needs, but again, the more established Tweedie distribution that is directly supported in GENMOD might be a better solution.

uzma03505621
Obsidian | Level 7

( categ_mdd is my main categorical variable that divides my sample into three categories )

 

proc genmod data=data2;
class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;
model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42 /dist=gamma link=log type3;
store p1;
run;


proc plm restore=p1;
lsmeans categ_mdd / e ilink diff exp;
ods output coef=coeffs;
run;

 

I have attached my output, what would be the mean expenditures in each of my category ?

 

 

categ_mdd Least Squares Means

categ_mdd

Estimate

Standard Error

z Value

Pr > |z|

Mean

Standard Error of Mean

Exponentiated

1

9.7918

0.05134

190.71

<.0001

17887

918.39

17887

2

9.5043

0.06243

152.25

<.0001

13417

837.59

13417

3

9.5901

0.04185

229.18

<.0001

14619

611.72

14619

StatDave
SAS Super FREQ

They are the values in the Mean column that you highlighted.

uzma03505621
Obsidian | Level 7

Okay, got it. Just a quick question here why are we using lsmeans and not just means? I am sorry I am very new to understanding all this.

StatDave
SAS Super FREQ
The means of your categories have to be computed at some setting of the other predictors in the model. The LS-means are linear combinations of the model parameters and set the other predictors at their means or reference categories. You can see this by looking at the coefficients that are printed by the E option in the LSMEANS statement.
uzma03505621
Obsidian | Level 7
Oh! Now I understand it.
Thank you

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2532 views
  • 3 likes
  • 2 in conversation