BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
byrne48
Calcite | Level 5

I am looking to construct a weight for combined surveys using NHANES data, I am combining 6 survey cycles

I created a new weight variable using this code 

 

****Construct weight for combined surveys;
data nh1.basedata ;
set nh1.basedata;
if sddsrvyr=5 or sddsrvyr=6 or sddsrvyr=7 or sddsrvyr=8 or sddsrvyr=9 or sddsrvyr=10 then Weightdiet12YR=1/6*WTDRD1;
run; 

when using the weight value in order to generate frequencies for categorical variables the weighted frequency results it an extremely large number. for example for gender (results table is below)

I have used proc surveyfreq for this

      
gender frequencyweighted frequencystd err of wgt freqpercentstd err or percent
11326596910369238919447.90770.3588
214263105375165251768052.09230.3588
total275282022855354687511100.0000 

 

proc surveyfreq data=nh1.basedata1 nomcar;
tables RIAGENDR age_cat PIR BMICAT SMOKECAT;
strata sdmvstra;
cluster sdmvpsu;
weight Weightdiet12YR;
where select=2;
run;
 

Is proc surveyfreq the best method? 

How to I fix the error with the weighted frequency, is it an issue with how the weight is calculated 

 

Thanks!!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

 

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

 

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.

 

Google may be your friend sometimes.

View solution in original post

1 REPLY 1
ballardw
Super User

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

 

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

 

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.

 

Google may be your friend sometimes.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1154 views
  • 1 like
  • 2 in conversation