BookmarkSubscribeRSS Feed
uzma03505621
Obsidian | Level 7

For a two-part/hurdle model for costs/expenditures as the dependent variable: Expected expenditures are obtained by multiplying the predictions from the two parts.

My study Model is as follows

cost= x1,x2,x3,x4,x5,x6,x7,x8 (categorical) x9 x10 (continuous)

x1 is my main classification variable that divides my sample into 3 groups; my study purpose is to estimate the healthcare expenditures in these 3 groups and the differences in their healthcare expenditures.

 

1st part:  logistic regression estimating the probability of positive expenditures

I created an indicator variable ispositive which is 1 for positive expenditure and 0 when no expenditure is incurred.

SAS code:

data data2;
 set data1;
if cost ne . then ispositive=(cost>0);
if ispositive then do;
 logcost=log(cost);
 costp=cost;
 end;
run;

 

Then the first part of my model, I use LOGIT:

proc logistic data=data2 descending;
 class categ_mdd sex adult povcat inscov marry region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4;/*these are categorical variables */
 model ispositive = categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42/firth;
 output out=predlogistic pred=phat;
run;

 

Second part of the model, I use GLM with log link and gamma dist, taking only positive costs (COSTP):

proc genmod data=data2;

class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;

 model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42  /dist=gamma link=log;

output out= y_hatC pred= condpred ;

run;

 

My question is now how do I obtain my expected cost = multiplying the predictions from the two parts ?

What should I multiply?

Is the expected cost= phat*condpred ?

5 REPLIES 5
StatDave
SAS Super FREQ
That is probably reasonable, but as I've suggested before, this complication goes away if you fit a Tweedie model. That is a single model and you just ask for the predicted values from the OUTPUT statement in GENMOD as usual.
uzma03505621
Obsidian | Level 7
Yeah, I will be trying Tweedie distribution.
But just to be sure about this two-part, once I have the predicted cost..then to get the mean expenditures of each category, I'll use lsmeans right?
uzma03505621
Obsidian | Level 7

I used the following as the syntax to get mean expenditures

 

 
data estimate;
merge y_hatc predlogistic;
predcost=phat*condpred;
run;


proc means data=estimate;
var predcost;
by categ_mdd;
run;

categ_mdd=1

 

Analysis Variable : predcost

N

Mean

Std Dev

Minimum

Maximum

1125

23338.54

13994.49

3179.75

167320.80

 

 

categ_mdd=2

 

Analysis Variable : predcost

N

Mean

Std Dev

Minimum

Maximum

627

20472.25

10540.79

2622.02

65667.96

 

 

categ_mdd=3

 

Analysis Variable : predcost

N

Mean

Std Dev

Minimum

Maximum

5236

13527.46

9546.90

2103.80

183025.71

Raymond-milan
Calcite | Level 5

Hi can you please let us know how to program a Tweedie distribution and how to get predicted value? 

Thank you!

StatDave
SAS Super FREQ

Just specify DIST=TWEEDIE in the MODEL statement and the PRED= option in the OUTPUT statement.

 

proc genmod;
model y = x / dist=tweedie;
output out=twpred pred=p; 
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1397 views
  • 2 likes
  • 3 in conversation