For a two-part/hurdle model for costs/expenditures as the dependent variable: Expected expenditures are obtained by multiplying the predictions from the two parts. My study Model is as follows cost= x1,x2,x3,x4,x5,x6,x7,x8 (categorical) x9 x10 (continuous) x1 is my main classification variable that divides my sample into 3 groups; my study purpose is to estimate the healthcare expenditures in these 3 groups and the differences in their healthcare expenditures. 1st part: logistic regression estimating the probability of positive expenditures I created an indicator variable ispositive which is 1 for positive expenditure and 0 when no expenditure is incurred. SAS code: data data2; set data1; if cost ne . then ispositive=(cost>0); if ispositive then do; logcost=log(cost); costp=cost; end; run; Then the first part of my model, I use LOGIT: proc logistic data=data2 descending; class categ_mdd sex adult povcat inscov marry region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4;/*these are categorical variables */ model ispositive = categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42/firth; output out=predlogistic pred=phat; run; Second part of the model, I use GLM with log link and gamma dist, taking only positive costs (COSTP): proc genmod data=data2; class categ_mdd adult sex povcat inscov cobd1 cobd2 cobd3 cobd4 cobd5 cobd6; model costp =categ_mdd adult sex marry povcat inscov region cobd1 cobd5 cobd6 cobd7 cobd8 cobd9 cobd2 cobd3 cobd4 mcs42 pcs42 /dist=gamma link=log; output out= y_hatC pred= condpred ; run; My question is now how do I obtain my expected cost = multiplying the predictions from the two parts ? What should I multiply? Is the expected cost= phat*condpred ?
