BookmarkSubscribeRSS Feed
ashlicole
Obsidian | Level 7

I'm hoping someone can help me understand when it is appropriate to add a WEIGHT statement to PROC MEANS versus when do I need to use PROC SURVEYMEANS. I have read the documentation and a few papers on the topic, but I still can't quite get a handle on this for my specific situation(s). I have little experience with survey data, so I apologize in advance for my ignorance.

 

Situation #1: I'm using an insurance claims database and I'd like to standardize my estimates to the U.S. population using weights I have calculated from U.S. census data. In other words, I will apply weights such that the demographic distribution of people in my dataset matches the demographic distribution of the U.S. and no subpopulations are over- or under-represented. There was no complex sampling scheme for this database -- it is a convenience sample. Is using a WEIGHT statement with PROC MEANS sufficient if I want to calculate the weighted average of, say, healthcare expenditures, provider visits, etc.? Why or why not? 

 

Situation #2: Using the same database consisting of a convenience sample of the U.S. population, I want to propensity-score weight 2 treatment groups. I have calculated weights such that the 2 groups will be similar on important baseline characteristics when the weights are applied. I want to compare outcomes (eg, healthcare expenditures, provider visits, ... etc)  between the weighted groups.  Is using a WEIGHT statement with PROC MEANS sufficient, or do I need to use SURVEYMEANS. 

 

I suspect using a WEIGHT statement might be sufficient for both of these purposes, but the only examples I can seem to find are situations where the WEIGHT statment is NOT sufficient. Again, apologies for my complete ignorance on this topic! Any helpful input or papers you can recommend are greatly appreciated!

5 REPLIES 5
evardoodle
Fluorite | Level 6

The weight statement should be sufficient for both situations. SURVEYMEANS was developed for complex survey designs. See here: SURVEYMEANS Overview Your study doesn't have stratified/clustered sampling. Furthermore, subgroup analysis is much more straightforward to use in the MEANS procedure.

Reeza
Super User

For #1 use PROC STDRATE to do an age/sex standardize rate.

#2 - yes use PROC SURVEYMEANs/FREQ because the standard deviations will be different and/or confidence intervals. The Means should be the same from either procedure. 

 

 

ashlicole
Obsidian | Level 7

Hi Reeza,

 

Thanks for your reply. And you're correct that both approaches produce the same means but different standard errors. Is it possible to obtain the correct variance estimate using PROC MEANS with a WEIGHT statement with the VARDEF= option? I'm a little confused about which option would be appropriate.

 

Reeza
Super User

I'm pretty certain you need to be using PROC SURVEYMEANS. 

Check the PROC MEANS doc and the WEIGHT statement, I think it has a note to the effect of when you should use which procedure. 

 

EDIT: I've also moved this thread to the Statistical Procedures forum, so someone with more knowledge can hopefully chime in 🙂

evardoodle
Fluorite | Level 6

Sorry I can't help more regarding correct SE methods. As I'm sure you know, most resources I found in internet searches are directed specifically toward complex sampling schemes, not unequal weighting problems exclusively. How different are the errors between MEANS and SURVEYMEANS? If you don't have complex sampling design, they should be pretty similar. Also check your weights for extreme values as this can be problematic. An alternative is to calculate stabilized weights, which usually reduces standard errors. Good luck.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2305 views
  • 3 likes
  • 3 in conversation