## Two-part /hurdle model for healthcare costs/expenditures as dependent variable

For a two-part/hurdle model for costs/expenditures as the dependent variable: Expected expenditures are obtained by multiplying the predictions from the two parts.

My study Model is as follows

cost= x1,x2,x3,x4,x5,x6,x7,x8 (categorical) x9 x10 (continuous)

x1 is my main classification variable that divides my sample into 3 groups; my study purpose is to estimate the healthcare expenditures in these 3 groups and the differences in their healthcare expenditures.

1st part:  logistic regression estimating the probability of positive expenditures

I created an indicator variable ispositive which is 1 for positive expenditure and 0 when no expenditure is incurred.

SAS code:

data data2;
set data1;
if cost ne . then ispositive=(cost>0);
if ispositive then do;
logcost=log(cost);
costp=cost;
end;
run;

Then the first part of my model, I use LOGIT:

proc logistic data=data2 descending;
class categ_mdd sex adult povcat inscov marry region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4;/*these are categorical variables */
model ispositive = categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42/firth;
output out=predlogistic pred=phat;
run;

Second part of the model, I use GLM with log link and gamma dist, taking only positive costs (COSTP):

proc genmod data=data2;

class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;

model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42  /dist=gamma link=log;

output out= y_hatC pred= condpred ;

run;

My question is now how do I obtain my expected cost = multiplying the predictions from the two parts ?

What should I multiply?

Is the expected cost= phat*condpred ?

## Re: Two-part /hurdle model for healthcare costs/expenditures as dependent variable

That is probably reasonable, but as I've suggested before, this complication goes away if you fit a Tweedie model. That is a single model and you just ask for the predicted values from the OUTPUT statement in GENMOD as usual.
## Re: Two-part /hurdle model for healthcare costs/expenditures as dependent variable

Yeah, I will be trying Tweedie distribution.
But just to be sure about this two-part, once I have the predicted cost..then to get the mean expenditures of each category, I'll use lsmeans right?
## Re: Two-part /hurdle model for healthcare costs/expenditures as dependent variable

I used the following as the syntax to get mean expenditures

`  data estimate; merge y_hatc predlogistic; predcost=phat*condpred; run;   proc means data=estimate; var predcost; by categ_mdd; run;`

categ_mdd=1

 Analysis Variable : predcost N Mean Std Dev Minimum Maximum 1125 23338.54 13994.49 3179.75 167320.80

categ_mdd=2

 Analysis Variable : predcost N Mean Std Dev Minimum Maximum 627 20472.25 10540.79 2622.02 65667.96

categ_mdd=3

 Analysis Variable : predcost N Mean Std Dev Minimum Maximum 5236 13527.46 9546.90 2103.80 183025.71
## Re: Two-part /hurdle model for healthcare costs/expenditures as dependent variable

Hi can you please let us know how to program a Tweedie distribution and how to get predicted value?

Thank you!

## Re: Two-part /hurdle model for healthcare costs/expenditures as dependent variable

Just specify DIST=TWEEDIE in the MODEL statement and the PRED= option in the OUTPUT statement.

``````proc genmod;
model y = x / dist=tweedie;
output out=twpred pred=p;
run;
``````
