Solved: Re: generalized linear model for healthcare expenditures

uzma03505621 · Posted 03-05-2021 11:52 AM

Hi everyone, I have a quick question about the generalized linear model with gamma distribution and log link function.

For my research, my study sample includes patients with hypertension and they are categorized as below based on depression status:

1)Hypertension+ Treated depression (n=1150)

2)Hypertension+Untreated depression (n=702)

3)Hypertension+ No depression (n=5229)

I have to estimate the differences in the healthcare expenditures (cost) in these groups. I have used the following syntax.

proc genmod data=studydata;

class depressioncategory age gender insurancestatus;

model cost= depressioncategory age gender insurancestatus income povertylevel/ dist=gamma link=log;

output out= data2 pred=phat;

run;

Now from this output, how can I interpret the differences in the costs in my three categories? Please guide me.

StatDave · Posted 03-10-2021 05:09 PM

They are the values in the Mean column that you highlighted.

View solution in original post

StatDave · Posted 03-05-2021 11:57 AM

You can use the NLMeans macro to estimate the differences in means among the categories. See the Results tab for an example that involves a gamma model. There also several other links to other examples.

uzma03505621 · Posted 03-05-2021 12:22 PM

okay, I will try that and thanks for the quick response 🙂

uzma03505621 · Posted 03-09-2021 10:04 PM

But in this model some of my independent variables are categorical and some are continuous. Will that make any difference to my output interpretation?

uzma03505621 · Posted 03-05-2021 11:57 AM

Also, out of the total 7081 patients, I have 24 patients with ZERO healthcare costs.

StatDave · Posted 03-05-2021 12:10 PM

Zero is not a valid value for a gamma distribution. For data that contains zeros, you can use the Tweedie distribution which is also available in GENMOD (DIST=TWEEDIE). You can fit that model and use the NLMeans macro in the same way as in the example in the macro documentation.

uzma03505621 · Posted 03-09-2021 10:19 PM

I found this as a revised syntax for gamma distribution which will take into consideration the ZERO observations also. Is it correct?

PROC GENMOD;
A = _MEAN_;
B = _RESP_;
D = B/A + LOG(A)
VARIANCE VAR = A**2
DEVIANCE DEV = D;
MODEL COST=X1 X2 X3 / LINK=LOG;

StatDave · Posted 03-10-2021 10:37 AM

That is neither correct nor incorrect, it is just a different distribution. That code uses an alteration of the gamma deviance which removes the part of it that excludes nonpositive values. With this deviance definition, zero and even negative values are allowed. It's up to you to see if the resulting model suits your needs, but again, the more established Tweedie distribution that is directly supported in GENMOD might be a better solution.

uzma03505621 · Posted 03-10-2021 04:58 PM

( categ_mdd is my main categorical variable that divides my sample into three categories )

proc genmod data=data2;
class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;
model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42 /dist=gamma link=log type3;
store p1;
run;

proc plm restore=p1;
lsmeans categ_mdd / e ilink diff exp;
ods output coef=coeffs;
run;

I have attached my output, what would be the mean expenditures in each of my category ?

categ_mdd Least Squares Means
categ_mdd	Estimate	Standard Error	z Value	Pr > \|z\|	Mean	Standard Error of Mean	Exponentiated
1	9.7918	0.05134	190.71	<.0001	17887	918.39	17887
2	9.5043	0.06243	152.25	<.0001	13417	837.59	13417
3	9.5901	0.04185	229.18	<.0001	14619	611.72	14619

StatDave · Posted 03-10-2021 05:09 PM

They are the values in the Mean column that you highlighted.

uzma03505621 · Posted 03-10-2021 05:29 PM

Okay, got it. Just a quick question here why are we using lsmeans and not just means? I am sorry I am very new to understanding all this.

StatDave · Posted 03-11-2021 02:21 PM

The means of your categories have to be computed at some setting of the other predictors in the model. The LS-means are linear combinations of the model parameters and set the other predictors at their means or reference categories. You can see this by looking at the coefficients that are printed by the E option in the LSMEANS statement.

uzma03505621 · Posted 03-11-2021 02:53 PM

Oh! Now I understand it.
Thank you

SAS Innovate 2025: Save the Date