Hi I am working on a project to examine costs using a nationally representative database, Medical Expenditure Panel Survey (MEPS). MEPS has a complex survey design with weight, cluster, and strata variables. I used OLS with duan smearing estimator (used for retransformation of logarithmic scale to original scale to avoid retransformation bias) to analyze costs but for one predictor variable with four categories, I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 and the difference should be statistically significant. Typically duan smearing factor lies within 1 to 4 but here for one category for the key predictor, I am getting it as 5.1. Here's what I have done so far: 1) Since cost distribution is not normal and heteroskedastic, I transformed the expenditures to a logarithmic scale. 2) I used OLS after adjusting for confounders and interaction terms to get the residual and predicted values for the duan smearing factor. 3) Once I got the smearing factor for each category for the key predictor variable (ins) , I multiplied it with the exponent of mean (log expenditures) to get the $ value. ***Mean of log not reported here ************************ *SAS code* ************************; /*All explanatory variables are categorical*/ Proc sort data=mylib.finalcohort; by ins; run; Proc surveyreg data=mylib.finalcohort; class ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status; model log_totalexp=ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status ins*age17x ins*race ins*education ins*income ins*marital_status ins*cancertype ins*nphycomorb ins*nmentalcomorb ins*disability_status/clparm solution; WEIGHT PERWT17F; STRATA VARSTR/nocollapse; CLUSTER VARPSU; output out=demo5 residual=RESID Pred=predict; format age17x age. income income. race racex. marital_status marry. education educatn. nphycomorb phycond. nmentalcomorb mntlcond. ins ins. cancertype cancertype. disability_status disabled. ; run; ********************************** *Estimating duan smearing factor* **********************************; /*Sort the data by exposure or grouping variable*/ proc sort data=demo5 out=ins5; by ins; run; /*exponentiate the residuals*/ data resid1; set ins5; by ins; exp_residual=exp(Resid); run; Proc sort data=resid1; by ins; run; /*total expenditures*/ Proc summary data= resid1; var exp_residual; by ins; output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared; run; PROC FREQ data= smeared5; TABLES ins*smeared; run; Ins Smearing factor Total expenditures ($) 95% CI p-value A(reference) 2.19 13238 11,729 14956.5. B 2.36 17812 17,812 20,103 0.55 C 5.4 14276 7,379 27,622 0.85 D 2.626 1379 1,379 6,282 <0.0001 Even without the interaction terms, the smearing factor is high. Any insight on this would be highly appreciated. Let me know if more details are required. Thanks.
