For a two-part/hurdle model for costs/expenditures as the dependent variable: Expected expenditures are obtained by multiplying the predictions from the two parts.
My study Model is as follows
cost= x1,x2,x3,x4,x5,x6,x7,x8 (categorical) x9 x10 (continuous)
x1 is my main classification variable that divides my sample into 3 groups; my study purpose is to estimate the healthcare expenditures in these 3 groups and the differences in their healthcare expenditures.
1st part: logistic regression estimating the probability of positive expenditures
I created an indicator variable ispositive which is 1 for positive expenditure and 0 when no expenditure is incurred.
SAS code:
data data2;
set data1;
if cost ne . then ispositive=(cost>0);
if ispositive then do;
logcost=log(cost);
costp=cost;
end;
run;
Then the first part of my model, I use LOGIT:
proc logistic data=data2 descending;
class categ_mdd sex adult povcat inscov marry region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4;/*these are categorical variables */
model ispositive = categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42/firth;
output out=predlogistic pred=phat;
run;
Second part of the model, I use GLM with log link and gamma dist, taking only positive costs (COSTP):
proc genmod data=data2;
class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6;
model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42 /dist=gamma link=log;
output out= y_hatC pred= condpred ;
run;
My question is now how do I obtain my expected cost = multiplying the predictions from the two parts ?
What should I multiply?
Is the expected cost= phat*condpred ?
I used the following as the syntax to get mean expenditures
data estimate;
merge y_hatc predlogistic;
predcost=phat*condpred;
run;
proc means data=estimate;
var predcost;
by categ_mdd;
run;
categ_mdd=1
Analysis Variable : predcost | ||||
N | Mean | Std Dev | Minimum | Maximum |
1125 | 23338.54 | 13994.49 | 3179.75 | 167320.80 |
categ_mdd=2
Analysis Variable : predcost | ||||
N | Mean | Std Dev | Minimum | Maximum |
627 | 20472.25 | 10540.79 | 2622.02 | 65667.96 |
categ_mdd=3
Analysis Variable : predcost | ||||
N | Mean | Std Dev | Minimum | Maximum |
5236 | 13527.46 | 9546.90 | 2103.80 | 183025.71 |
Hi can you please let us know how to program a Tweedie distribution and how to get predicted value?
Thank you!
Just specify DIST=TWEEDIE in the MODEL statement and the PRED= option in the OUTPUT statement.
proc genmod;
model y = x / dist=tweedie;
output out=twpred pred=p;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.