BookmarkSubscribeRSS Feed
Abrefah
Calcite | Level 5

I am analyzing data using HINTS 5 cycles 3 and 4, which I stacked to get 9,303 observations. Weight and variance for the cycles according to the documentation were:  

  Cycle 4:

strata VAR_STRATUM;

cluster VAR_CLUSTER;

weight PERSON_FINWT0;

 Cycle 3 

strata VAR_STRATUM;

cluster VAR_CLUSTER;

weight NWGT0;

 

Because the weight variables were different for each cycle, I created a common weighted variable for the two (final_weight)

 

Also, there was no unique identifier for the HINTS data, so I created one using the household IDs and then I merged the datasets. 

 

Below is how I applied the code up to this point:

 

/* Step 1: Recode HeardHPVVaccine2 and Gender and Include Other Variables for Cycle 3 */

data cycle3_recoded;

set tmp1.hints5_cycle3_public;

 

/* Recode Gender */

if GenderC = 1 then Gender = 1; /* Male */

else if GenderC = 2 then Gender = 2; /* Female */

else Gender = .; /* Missing values */

 

/* Recode HeardHPVVaccine2 */

if SEEKCAN = 1 then HeardHPVVaccine2 = 1; /* Yes */

else if SEEKCAN = 2 then HeardHPVVaccine2 = 2; /* No */

else if SEEKCAN in (-9, -7) then HeardHPVVaccine2 = .; /* Missing values */

 

/* Rename weight variable */

final_weight = NWGT0;

 

/* Add survey cycle identifier */

surveycycle = 3;

run;

 

/* Step 2: Recode HeardHPVVaccine2 and Gender and Include Other Variables for Cycle 4 */

data cycle4_recoded;

set tmp2.hints5_cycle4_public;

 

/* Recode Gender */

if SelfGender = 1 then Gender = 1; /* Male */

else if SelfGender = 2 then Gender = 2; /* Female */

else Gender = .; /* Missing values */

 

/* Recode HeardHPVVaccine2 */

if ELECTRO = 1 then HeardHPVVaccine2 = 1; /* Yes */

else if ELECTRO = 2 then HeardHPVVaccine2 = 2; /* No */

else if ELECTRO = -9 then HeardHPVVaccine2 = .; /* Missing values */

 

/* Rename weight variable */

final_weight = PERSON_FINWT0;

 

/* Add survey cycle identifier */

surveycycle = 4;

run;

/* Step 3: Stack the Two Cycles */

data stacked;

set cycle3_recoded cycle4_recoded;

newID = CATS(HHID, surveycycle);

run;

/* Step 5: Extract Variables Needed for the Research Question */

data selected;

set stacked;

keep RaceEthn5 AgeGrpA EducA  MaritalStatus HeardHPV HPVCauseCancer_Cervical HeardHPVVaccine2 ExplainedClearly SpentEnoughTime Gender final_weight VAR_STRATUM VAR_CLUSTER;

run;

 

After going through all the above, I decided to use a means procedure to look at the combined sampling weight (final_weight), with this code:

proc means data=recode1 n min mean max sum;
var final_weight;
run;

and I realized it was overly high (1010026682), greater than the US population (335,893,238). Will this be the same with the strata and cluster variables? 

What should I do?

 

 

1 REPLY 1
ballardw
Super User

I am not familiar with HINTS data in any form. Likely each cycle was weighted to a population total then I would expect most methods of renaming variables and combining to have a "population" estimate of roughly N times the population where N is the number of "cycles" assuming no extreme changes in the population between each cycle.

 

One basic approach is to scale each cycle to a proportion of a chosen population total.

 

With BRFSS data, a large scale complex survey I have worked with, one approach is:

To combine multiple years of Behavioral Risk Factor Surveillance System (BRFSS) data, 
you can adjust the weight variable proportionally based on the sample sizes for each year: Determine the sample size for each year Add the sample sizes together Calculate the proportion for each year by dividing the sample size for that year by the total sample size Adjust the weight for each year by multiplying the original weight by the proportion for that year

 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 800 views
  • 0 likes
  • 2 in conversation