- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am looking to construct a weight for combined surveys using NHANES data, I am combining 6 survey cycles
I created a new weight variable using this code
****Construct weight for combined surveys;
data nh1.basedata ;
set nh1.basedata;
if sddsrvyr=5 or sddsrvyr=6 or sddsrvyr=7 or sddsrvyr=8 or sddsrvyr=9 or sddsrvyr=10 then Weightdiet12YR=1/6*WTDRD1;
run;
when using the weight value in order to generate frequencies for categorical variables the weighted frequency results it an extremely large number. for example for gender (results table is below)
I have used proc surveyfreq for this
gender | frequency | weighted frequency | std err of wgt freq | percent | std err or percent |
1 | 13265 | 96910369 | 2389194 | 47.9077 | 0.3588 |
2 | 14263 | 105375165 | 2517680 | 52.0923 | 0.3588 |
total | 27528 | 202285535 | 4687511 | 100.0000 |
proc surveyfreq data=nh1.basedata1 nomcar; tables RIAGENDR age_cat PIR BMICAT SMOKECAT; strata sdmvstra; cluster sdmvpsu; weight Weightdiet12YR; where select=2; run;
Is proc surveyfreq the best method?
How to I fix the error with the weighted frequency, is it an issue with how the weight is calculated
Thanks!!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.
You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.
Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.
You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.
There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.
Google may be your friend sometimes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.
You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.
Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.
You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.
There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.
Google may be your friend sometimes.