Hi I am working on a project to examine costs using a nationally representative database, Medical Expenditure Panel Survey (MEPS). MEPS has a complex survey design with weight, cluster, and strata variables. I used OLS with duan smearing estimator (used for retransformation of logarithmic scale to original scale to avoid retransformation bias) to analyze costs but for one predictor variable with four categories, I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 and the difference should be statistically significant. Typically duan smearing factor lies within 1 to 4 but here for one category for the key predictor, I am getting it as 5.1. Here's what I have done so far: 1) Since cost distribution is not normal and heteroskedastic, I transformed the expenditures to a logarithmic scale. 2) I used OLS after adjusting for confounders and interaction terms to get the residual and predicted values for the duan smearing factor. 3) Once I got the smearing factor for each category for the key predictor variable (ins) , I multiplied it with the exponent of mean (log expenditures) to get the $ value. ***Mean of log not reported here ************************ *SAS code* ************************; /*All explanatory variables are categorical*/ Proc sort data=mylib.finalcohort; by ins; run; Proc surveyreg data=mylib.finalcohort; class ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status; model log_totalexp=ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status ins*age17x ins*race ins*education ins*income ins*marital_status ins*cancertype ins*nphycomorb ins*nmentalcomorb ins*disability_status/clparm solution; WEIGHT PERWT17F; STRATA VARSTR/nocollapse; CLUSTER VARPSU; output out=demo5 residual=RESID Pred=predict; format age17x age. income income. race racex. marital_status marry. education educatn. nphycomorb phycond. nmentalcomorb mntlcond. ins ins. cancertype cancertype. disability_status disabled. ; run; ********************************** *Estimating duan smearing factor* **********************************; /*Sort the data by exposure or grouping variable*/ proc sort data=demo5 out=ins5; by ins; run; /*exponentiate the residuals*/ data resid1; set ins5; by ins; exp_residual=exp(Resid); run; Proc sort data=resid1; by ins; run; /*total expenditures*/ Proc summary data= resid1; var exp_residual; by ins; output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared; run; PROC FREQ data= smeared5; TABLES ins*smeared; run; Ins Smearing factor Total expenditures ($) 95% CI p-value A(reference) 2.19 13238 11,729 14956.5. B 2.36 17812 17,812 20,103 0.55 C 5.4 14276 7,379 27,622 0.85 D 2.626 1379 1,379 6,282 <0.0001 Even without the interaction terms, the smearing factor is high. Any insight on this would be highly appreciated. Let me know if more details are required. Thanks.
... View more