I am looking to construct a weight for combined surveys using NHANES data, I am combining 6 survey cycles
I created a new weight variable using this code
****Construct weight for combined surveys;
data nh1.basedata ;
set nh1.basedata;
if sddsrvyr=5 or sddsrvyr=6 or sddsrvyr=7 or sddsrvyr=8 or sddsrvyr=9 or sddsrvyr=10 then Weightdiet12YR=1/6*WTDRD1;
run;
when using the weight value in order to generate frequencies for categorical variables the weighted frequency results it an extremely large number. for example for gender (results table is below)
I have used proc surveyfreq for this
gender | frequency | weighted frequency | std err of wgt freq | percent | std err or percent |
1 | 13265 | 96910369 | 2389194 | 47.9077 | 0.3588 |
2 | 14263 | 105375165 | 2517680 | 52.0923 | 0.3588 |
total | 27528 | 202285535 | 4687511 | 100.0000 |
proc surveyfreq data=nh1.basedata1 nomcar; tables RIAGENDR age_cat PIR BMICAT SMOKECAT; strata sdmvstra; cluster sdmvpsu; weight Weightdiet12YR; where select=2; run;
Is proc surveyfreq the best method?
How to I fix the error with the weighted frequency, is it an issue with how the weight is calculated
Thanks!!!
The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.
You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.
Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.
You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.
There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.
Google may be your friend sometimes.
The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.
You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.
Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.
You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.
There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.
Google may be your friend sometimes.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.
Find more tutorials on the SAS Users YouTube channel.