BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SSK_011523
Calcite | Level 5

Hi 

 

I am working on a project to examine costs using a nationally representative database, Medical Expenditure Panel Survey (MEPS). MEPS has a complex survey design with weight, cluster, and strata variables. I used OLS with duan smearing estimator (used for retransformation of logarithmic scale to original scale to avoid retransformation bias) to analyze costs but for one predictor variable with four categories, I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 and the difference should be statistically significant. Typically duan smearing factor lies within 1 to 4 but here for one category for the key predictor, I am getting it as 5.1.

 

Here's what I have done so far:

1) Since cost distribution is not normal and heteroskedastic, I transformed the expenditures to a logarithmic scale.

2) I used OLS after adjusting for confounders and interaction terms to get the residual and predicted values for the duan smearing factor.

3) Once I got the smearing factor for each category for the key predictor variable (ins) , I multiplied it with the exponent of mean (log expenditures) to get the $ value.

 

***Mean of log not reported here

 

************************

*SAS code*

************************;

/*All explanatory variables are categorical*/

 

Proc sort data=mylib.finalcohort; by ins; run;

 

Proc surveyreg data=mylib.finalcohort;
class ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status;
model log_totalexp=ins age17x race education income marital_status cancertype nphycomorb nmentalcomorb disability_status
ins*age17x  ins*race ins*education ins*income ins*marital_status ins*cancertype ins*nphycomorb ins*nmentalcomorb ins*disability_status/clparm solution;
WEIGHT PERWT17F;
STRATA VARSTR/nocollapse;
CLUSTER VARPSU;
output out=demo5 residual=RESID Pred=predict;
format age17x age. income income. race racex. marital_status marry. education educatn.
nphycomorb phycond. nmentalcomorb mntlcond. ins ins. cancertype cancertype. disability_status disabled. ;
run;

**********************************
*Estimating duan smearing factor*
**********************************;

/*Sort the data by exposure or grouping variable*/

 

proc sort data=demo5 out=ins5; by ins; run;

/*exponentiate the residuals*/
data resid1;
set ins5;
by ins;
exp_residual=exp(Resid);
run;

Proc sort data=resid1; by ins; run;

/*total expenditures*/
Proc summary data= resid1;
var exp_residual;
by ins;
output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared;
run;

PROC FREQ data= smeared5; TABLES ins*smeared; run;

 

InsSmearing factorTotal expenditures ($)95% CI p-value
 A(reference)2.191323811,72914956.5. 
B2.361781217,81220,1030.55
C5.4142767,37927,6220.85
D2.62613791,3796,282<0.0001

 

Even without the interaction terms, the smearing factor is high.

Any insight on this would be highly appreciated. Let me know if more details are required. Thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Just remove by ins; in proc summary, and ins from the keep clause, it will calculate a single factor from all residuals.

PG

View solution in original post

5 REPLIES 5
ballardw
Super User

I don't know smearing so can't address anything related to that.

 

You claim that you have " I am getting non-overlapping 95%CI but the p-value is > 0.05. My understanding is if the 95%CI are not overlapping then p-value should be < 0.05 ".

 

You may need to describe "confidence interval around what". I am suspicious about that column you post with 95% CI where the value for two of the rows looks exactly like the "total expenditures" values but with a comma. If you are expecting a confidence interval around expenditures I find it very odd that the estimated value is the, possibly, lower limit.

 

I don't see any example of non-overlapping confidence intervals in your post. Perhaps you need to explicitly point out which one(s) overlap or provide more output. If you are going to post output make sure that the results, such as column headings align. This forum does a lot of "reformatting" of posted items and generally it is best to 1) set ODS LISTING, 2) copy the results from the Output window (not results) and 3) paste that into a text box opened on the forum with the </> icon to keep the forum from reformatting the text.

SSK_011523
Calcite | Level 5

Hi 

 

Thank you for your response.

 

I am talking about 95% CI around the total expenditures. For A, the total expenditure is $ 13238 with 95% CI as $11,729 to $ 14956.5 while for B the total expenditure is $17,812 ($15,656 to $20,103). A and B have non-overlapping 95%CI and yet they are not statistically significant. Yes, there was a formatting error when I pasted the results table. The total expenditures and the lower limit for 95%CI are not the same. 

 

There is no direct SAS output for this. I got these expenditures mean of log of expenditures and the duan smearing estimator as explained in my first post. From reg model, I got the beta estimates and p-values 

                                                                                        UNADJUSTED REG MODEL

                                                  
 EstimateSEpvalue95%CI95%CI
Intercept8.70775900.06180345<.00018.58593108.8295870
D-2.44328610.774678310.0018-3.9703451-0.9162270
C-0.82286520.343281090.0174-1.4995467-0.1461836
B0.22107630.093229390.01860.03730090.4048517
A0.00000000.00000000.0.00000000.0000000

 

ADJUSTED REGRESSION MODEL


 EstimateSEpvalue95%CI95%CI
Interecept8.46632980.42958130<.00017.6195329.3131278
D-3.88372390.41579548<.0001-4.703347-3.0641007
C0.25579591.393822240.8546-2.4917303.0033220
B0.43801180.733677170.5511-1.0082251.8842487
A0.00000000.00000000.0.0000000.0000000

 

 

PGStats
Opal | Level 21

Were the CIs consistent with the p-values BEFORE applying the smearing factors? The surveyreg model (linear) assumes that the residuals all come from the same distribution, and the p-values reflect that. By calculating separate smearing corrections for each ins, you are violating that assumption. What happens if you calculate a single smearing factor?

PG
SSK_011523
Calcite | Level 5

Hi 

 

That's a very good point.

 

The 95% CI from the adjusted regression model was consistent with the p-values. I haven't calculated a single smearing factor for ins. Although, I am not sure how to do it in SAS. 

 

Following code calculates smearing factor for each category of ins

**********************************
*Estimating duan smearing factor*
**********************************;

/*Sort the data by exposure or grouping variable*/

proc sort data=demo5 out=ins5; by ins; run;

/*exponentiate the residuals*/

data resid1;
set ins5;
by ins;
exp_residual=exp(Resid);
run;

Proc sort data=resid1; by ins; run;

/*total expenditures*/
Proc summary data= resid1;
var exp_residual;
by ins;
output out= smeared5 (keep=smeared ins) mean(exp_residual)=smeared;
run;

PROC FREQ data= smeared5; TABLES ins*smeared; run;

PGStats
Opal | Level 21

Just remove by ins; in proc summary, and ins from the keep clause, it will calculate a single factor from all residuals.

PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2096 views
  • 2 likes
  • 3 in conversation