Hi everyone, I have a quick question about the generalized linear model with gamma distribution and log link function.
For my research, my study sample includes patients with hypertension and they are categorized as below based on depression status:
1)Hypertension+ Treated depression (n=1150)
2)Hypertension+Untreated depression (n=702)
3)Hypertension+ No depression (n=5229)
I have to estimate the differences in the healthcare expenditures (cost) in these groups. I have used the following syntax.
proc genmod data=studydata;
class depressioncategory age gender insurancestatus;
model cost= depressioncategory age gender insurancestatus income povertylevel/ dist=gamma link=log;
output out= data2 pred=phat;
run;
Now from this output, how can I interpret the differences in the costs in my three categories? Please guide me.
They are the values in the Mean column that you highlighted.
You can use the NLMeans macro to estimate the differences in means among the categories. See the Results tab for an example that involves a gamma model. There also several other links to other examples.
okay, I will try that and thanks for the quick response 🙂
But in this model some of my independent variables are categorical and some are continuous. Will that make any difference to my output interpretation?
Also, out of the total 7081 patients, I have 24 patients with ZERO healthcare costs.
I found this as a revised syntax for gamma distribution which will take into consideration the ZERO observations also. Is it correct?
PROC GENMOD;
A = _MEAN_;
B = _RESP_;
D = B/A + LOG(A)
VARIANCE VAR = A**2
DEVIANCE DEV = D;
MODEL COST=X1 X2 X3 / LINK=LOG;
That is neither correct nor incorrect, it is just a different distribution. That code uses an alteration of the gamma deviance which removes the part of it that excludes nonpositive values. With this deviance definition, zero and even negative values are allowed. It's up to you to see if the resulting model suits your needs, but again, the more established Tweedie distribution that is directly supported in GENMOD might be a better solution.
( categ_mdd is my main categorical variable that divides my sample into three categories )
proc genmod data=data2;
class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;
model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42 /dist=gamma link=log type3;
store p1;
run;
proc plm restore=p1;
lsmeans categ_mdd / e ilink diff exp;
ods output coef=coeffs;
run;
I have attached my output, what would be the mean expenditures in each of my category ?
categ_mdd Least Squares Means | |||||||
categ_mdd | Estimate | Standard Error | z Value | Pr > |z| | Mean | Standard Error of Mean | Exponentiated |
1 | 9.7918 | 0.05134 | 190.71 | <.0001 | 17887 | 918.39 | 17887 |
2 | 9.5043 | 0.06243 | 152.25 | <.0001 | 13417 | 837.59 | 13417 |
3 | 9.5901 | 0.04185 | 229.18 | <.0001 | 14619 | 611.72 | 14619 |
They are the values in the Mean column that you highlighted.
Okay, got it. Just a quick question here why are we using lsmeans and not just means? I am sorry I am very new to understanding all this.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.