New SAS User

byrne48 · Posted 04-11-2023 01:01 PM

I am looking to construct a weight for combined surveys using NHANES data, I am combining 6 survey cycles

I created a new weight variable using this code

****Construct weight for combined surveys;
data nh1.basedata ;
set nh1.basedata;
if sddsrvyr=5 or sddsrvyr=6 or sddsrvyr=7 or sddsrvyr=8 or sddsrvyr=9 or sddsrvyr=10 then Weightdiet12YR=1/6*WTDRD1;
run;

when using the weight value in order to generate frequencies for categorical variables the weighted frequency results it an extremely large number. for example for gender (results table is below)

I have used proc surveyfreq for this


gender	frequency	weighted frequency	std err of wgt freq	percent	std err or percent
1	13265	96910369	2389194	47.9077	0.3588
2	14263	105375165	2517680	52.0923	0.3588
total	27528	202285535	4687511	100.0000

proc surveyfreq data=nh1.basedata1 nomcar;
tables RIAGENDR age_cat PIR BMICAT SMOKECAT;
strata sdmvstra;
cluster sdmvpsu;
weight Weightdiet12YR;
where select=2;
run;

Is proc surveyfreq the best method?

How to I fix the error with the weighted frequency, is it an issue with how the weight is calculated

Thanks!!!

ballardw · Posted 04-11-2023 01:27 PM

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.

Google may be your friend sometimes.

View solution in original post

ballardw · Posted 04-11-2023 01:27 PM

The weighted data for each survey is supposed to yield a total population estimate. When you combine x number of surveys you will get roughly x*the population total.

You have to figure out the contribution of each cycle (i.e. total weight) and adjust as a proportion of the desired population year.

Likely by one or more population characteristics such as age, sex and geography. As a minimum they should be the Strata and Cluster variables. And hope the cluster definitions are the same or you have to standardize your data to one of the clusters so the data can be combined correctly.

You should pick one of the years as the standard to adjust to. There's a number of arguments about whether to use the middle value, if available vs the end points.

There are some details specific to NHANES at https://wwwn.cdc.gov/nchs/nhanes/tutorials/Weighting.aspx that specifically mentions multiple cycles. Follow the links.

Google may be your friend sometimes.

New SAS User

Using weighted frequencies for NHANES data

Re: Using weighted frequencies for NHANES data

Re: Using weighted frequencies for NHANES data

Follow Us

What is...

New SAS User

Using weighted frequencies for NHANES data

Re: Using weighted frequencies for NHANES data

Re: Using weighted frequencies for NHANES data

Our biggest data and AI event of the year.

Follow Us

What is...