## Using weighted frequencies for NHANES data

I am looking to construct a weight for combined surveys using NHANES data, I am combining 6 survey cycles

I created a new weight variable using this code

``````****Construct weight for combined surveys;
data nh1.basedata ;
set nh1.basedata;
if sddsrvyr=5 or sddsrvyr=6 or sddsrvyr=7 or sddsrvyr=8 or sddsrvyr=9 or sddsrvyr=10 then Weightdiet12YR=1/6*WTDRD1;
run; ``````

when using the weight value in order to generate frequencies for categorical variables the weighted frequency results it an extremely large number. for example for gender (results table is below)

I have used proc surveyfreq for this

 gender frequency weighted frequency std err of wgt freq percent std err or percent 1 13265 96910369 2389194 47.9077 0.3588 2 14263 105375165 2517680 52.0923 0.3588 total 27528 202285535 4687511 100.0000

```proc surveyfreq data=nh1.basedata1 nomcar;
tables RIAGENDR age_cat PIR BMICAT SMOKECAT;
strata sdmvstra;
cluster sdmvpsu;
weight Weightdiet12YR;
where select=2;
run;```

Is proc surveyfreq the best method?

How to I fix the error with the weighted frequency, is it an issue with how the weight is calculated

Thanks!!!

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Using weighted frequencies for NHANES data

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.

## Re: Using weighted frequencies for NHANES data

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.