I am analyzing data using HINTS 5 cycles 3 and 4, which I stacked to get 9,303 observations. Weight and variance for the cycles according to the documentation were: Cycle 4: strata VAR_STRATUM; cluster VAR_CLUSTER; weight PERSON_FINWT0; Cycle 3 strata VAR_STRATUM; cluster VAR_CLUSTER; weight NWGT0; Because the weight variables were different for each cycle, I created a common weighted variable for the two (final_weight) Also, there was no unique identifier for the HINTS data, so I created one using the household IDs and then I merged the datasets. Below is how I applied the code up to this point: /* Step 1: Recode HeardHPVVaccine2 and Gender and Include Other Variables for Cycle 3 */ data cycle3_recoded; set tmp1.hints5_cycle3_public; /* Recode Gender */ if GenderC = 1 then Gender = 1; /* Male */ else if GenderC = 2 then Gender = 2; /* Female */ else Gender = .; /* Missing values */ /* Recode HeardHPVVaccine2 */ if SEEKCAN = 1 then HeardHPVVaccine2 = 1; /* Yes */ else if SEEKCAN = 2 then HeardHPVVaccine2 = 2; /* No */ else if SEEKCAN in (-9, -7) then HeardHPVVaccine2 = .; /* Missing values */ /* Rename weight variable */ final_weight = NWGT0; /* Add survey cycle identifier */ surveycycle = 3; run; /* Step 2: Recode HeardHPVVaccine2 and Gender and Include Other Variables for Cycle 4 */ data cycle4_recoded; set tmp2.hints5_cycle4_public; /* Recode Gender */ if SelfGender = 1 then Gender = 1; /* Male */ else if SelfGender = 2 then Gender = 2; /* Female */ else Gender = .; /* Missing values */ /* Recode HeardHPVVaccine2 */ if ELECTRO = 1 then HeardHPVVaccine2 = 1; /* Yes */ else if ELECTRO = 2 then HeardHPVVaccine2 = 2; /* No */ else if ELECTRO = -9 then HeardHPVVaccine2 = .; /* Missing values */ /* Rename weight variable */ final_weight = PERSON_FINWT0; /* Add survey cycle identifier */ surveycycle = 4; run; /* Step 3: Stack the Two Cycles */ data stacked; set cycle3_recoded cycle4_recoded; newID = CATS(HHID, surveycycle); run; /* Step 5: Extract Variables Needed for the Research Question */ data selected; set stacked; keep RaceEthn5 AgeGrpA EducA MaritalStatus HeardHPV HPVCauseCancer_Cervical HeardHPVVaccine2 ExplainedClearly SpentEnoughTime Gender final_weight VAR_STRATUM VAR_CLUSTER; run; After going through all the above, I decided to use a means procedure to look at the combined sampling weight (final_weight), with this code: proc means data=recode1 n min mean max sum; var final_weight; run; and I realized it was overly high (1010026682), greater than the US population (335,893,238). Will this be the same with the strata and cluster variables? What should I do?
... View more