Solved: Re: How to use OLS with duan smearing estimator for analyzing costs

SSK_011523 · Posted 03-05-2021 03:12 PM

Hi

I am working on a project to examine costs using a nationally representative database, Medical Expenditure Panel Survey (MEPS). MEPS has a complex survey design with weight, cluster, and strata variables. I used OLS with duan smearing estimator (used for retransformation of logarithmic scale to original scale to avoid retransformation bias) to analyze costs but for one predictor variable with four categories, I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 and the difference should be statistically significant. Typically duan smearing factor lies within 1 to 4 but here for one category for the key predictor, I am getting it as 5.1.

Here's what I have done so far:

1) Since cost distribution is not normal and heteroskedastic, I transformed the expenditures to a logarithmic scale.

2) I used OLS after adjusting for confounders and interaction terms to get the residual and predicted values for the duan smearing factor.

3) Once I got the smearing factor for each category for the key predictor variable (ins) , I multiplied it with the exponent of mean (log expenditures) to get the $ value.

***Mean of log not reported here

************************

*SAS code*

************************;

/*All explanatory variables are categorical*/

Proc sort data=mylib.finalcohort; by ins; run;

Proc surveyreg data=mylib.finalcohort;
class ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status;
model log_totalexp=ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status
ins*age17x ins*race ins*education ins*income ins*marital_status ins*cancertype ins*nphycomorb ins*nmentalcomorb ins*disability_status/clparm solution;
WEIGHT PERWT17F;
STRATA VARSTR/nocollapse;
CLUSTER VARPSU;
output out=demo5 residual=RESID Pred=predict;
format age17x age. income income. race racex. marital_status marry. education educatn.
nphycomorb phycond. nmentalcomorb mntlcond. ins ins. cancertype cancertype. disability_status disabled. ;
run;

**********************************
*Estimating duan smearing factor*
**********************************;

/*Sort the data by exposure or grouping variable*/

proc sort data=demo5 out=ins5; by ins; run;

/*exponentiate the residuals*/
data resid1;
set ins5;
by ins;
exp_residual=exp(Resid);
run;

Proc sort data=resid1; by ins; run;

/*total expenditures*/
Proc summary data= resid1;
var exp_residual;
by ins;
output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared;
run;

PROC FREQ data= smeared5; TABLES ins*smeared; run;

Ins	Smearing factor	Total expenditures ($)	95% CI	p-value
A(reference)	2.19	13238	11,729	14956.5.
B	2.36	17812	17,812	20,103	0.55
C	5.4	14276	7,379	27,622	0.85
D	2.626	1379	1,379	6,282	<0.0001

Even without the interaction terms, the smearing factor is high.

Any insight on this would be highly appreciated. Let me know if more details are required. Thanks.

PGStats · Posted 03-07-2021 11:58 PM

Just remove by ins; in proc summary, and ins from the keep clause, it will calculate a single factor from all residuals.

PG

View solution in original post

ballardw · Posted 03-05-2021 05:48 PM

I don't know smearing so can't address anything related to that.

You claim that you have " I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 ".

You may need to describe "confidence interval around what". I am suspicious about that column you post with 95% CI where the value for two of the rows looks exactly like the "total expenditures" values but with a comma. If you are expecting a confidence interval around expenditures I find it very odd that the estimated value is the, possibly, lower limit.

I don't see any example of non-overlapping confidence intervals in your post. Perhaps you need to explicitly point out which one(s) overlap or provide more output. If you are going to post output make sure that the results, such as column headings align. This forum does a lot of "reformatting" of posted items and generally it is best to 1) set ODS LISTING, 2) copy the results from the Output window (not results) and 3) paste that into a text box opened on the forum with the </> icon to keep the forum from reformatting the text.

SSK_011523 · Posted 03-07-2021 10:47 PM

Hi

Thank you for your response.

I am talking about 95% CI around the total expenditures. For A, the total expenditure is $ 13238 with 95% CI as $11,729 to $ 14956.5 while for B the total expenditure is $17,812 ($15,656 to $20,103). A and B have non-overlapping 95%CI and yet they are not statistically significant. Yes, there was a formatting error when I pasted the results table. The total expenditures and the lower limit for 95%CI are not the same.

There is no direct SAS output for this. I got these expenditures mean of log of expenditures and the duan smearing estimator as explained in my first post. From reg model, I got the beta estimates and p-values

UNADJUSTED REG MODEL

	Estimate	SE	pvalue	95%CI	95%CI
Intercept	8.7077590	0.06180345	<.0001	8.5859310	8.8295870
D	-2.4432861	0.77467831	0.0018	-3.9703451	-0.9162270
C	-0.8228652	0.34328109	0.0174	-1.4995467	-0.1461836
B	0.2210763	0.09322939	0.0186	0.0373009	0.4048517
A	0.0000000	0.00000000	.	0.0000000	0.0000000

ADJUSTED REGRESSION MODEL

	Estimate	SE	pvalue	95%CI	95%CI
Interecept	8.4663298	0.42958130	<.0001	7.619532	9.3131278
D	-3.8837239	0.41579548	<.0001	-4.703347	-3.0641007
C	0.2557959	1.39382224	0.8546	-2.491730	3.0033220
B	0.4380118	0.73367717	0.5511	-1.008225	1.8842487
A	0.0000000	0.00000000	.	0.000000	0.0000000

PGStats · Posted 03-05-2021 05:49 PM

Were the CIs consistent with the p-values BEFORE applying the smearing factors? The surveyreg model (linear) assumes that the residuals all come from the same distribution, and the p-values reflect that. By calculating separate smearing corrections for each ins, you are violating that assumption. What happens if you calculate a single smearing factor?

PG

SSK_011523 · Posted 03-07-2021 11:05 PM

Hi

That's a very good point.

The 95% CI from the adjusted regression model was consistent with the p-values. I haven't calculated a single smearing factor for ins. Although, I am not sure how to do it in SAS.

Following code calculates smearing factor for each category of ins

**********************************
*Estimating duan smearing factor*
**********************************;

/*Sort the data by exposure or grouping variable*/

proc sort data=demo5 out=ins5; by ins; run;

/*exponentiate the residuals*/

data resid1;
set ins5;
by ins;
exp_residual=exp(Resid);
run;

Proc sort data=resid1; by ins; run;

/*total expenditures*/
Proc summary data= resid1;
var exp_residual;
by ins;
output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared;
run;

PROC FREQ data= smeared5; TABLES ins*smeared; run;

PGStats · Posted 03-07-2021 11:58 PM

Just remove by ins; in proc summary, and ins from the keep clause, it will calculate a single factor from all residuals.

PG

SAS Innovate 2025: Call for Content

Classroom Training Available!