Hello programmers,
I am want to do an analysis to estimate the risk of developing stroke in those who have a heart disease in the recently released NHIS longitudinal dataset. The NHIS released a longitudinal dataset last year that makes it possible for researchers to merge the 2019 and 2020 together. So my plan is to have a sample of people with heart disease (after applying all exclusion criteria) in the 2019 dataset and check to see if they have the outcome (stroke) in the 2020 dataset. I intend to use cox regression while controlling for baseline covariates.
I have however not been able to merge the 2019 dataset and the 2020 dataset with the joining file. I will appreciate if anyone can help me merge it in a way for me to conduct my analysis. The 2019 dataset has over 300k obs, 2020 has 31k and the join dataset (people followed up from 2019 to 2020) is 10k.
My initial code is as follows but i know I'm missing something. I have attached the 2019, 2020 and join dataset in excel as well.
data merged_data;
merge nhis.adultlong20 (in=a)
one (in=b)
two(in=c);
by HHX_2019 ; /* the common variable */
run;
Please post data in usable form (data step with datalines), many users won't open office files due to security threads caused by the file-format.
/*2019 Dataset has this structure*/
RECTYPE SRVY_YR HHX REGION PSTRAT PPSU
10 Sample Adult 2019 H048109 3 South 122 2
10 Sample Adult 2019 H027044 3 South 122 2
10 Sample Adult 2019 H058855 3 South 122 2
10 Sample Adult 2019 H031993 3 South 122 2
10 Sample Adult 2019 H007122 3 South 115 2
10 Sample Adult 2019 H007736 3 South 115 2
10 Sample Adult 2019 H040698 3 South 115 1
10 Sample Adult 2019 H022161 3 South 115 1
10 Sample Adult 2019 H017054 3 South 115 1
/*2020 dataset has this structure*/
RECTYPE SRVY_YR HHX REGION PSTRAT PPSU
10 2020 H066706 3 Almost everything 103 2
10 2020 H034928 3 Almost everything 103 2
10 2020 H018289 3 Almost everything 103 2
10 2020 H006876 3 Almost everything 103 2
10 2020 H028842 3 Almost everything 103 2
10 2020 H004811 3 Almost everything 103 2
10 2020 H068043 3 Almost everything 103 2
/*joining dataset has this structure*/
RECTYPE SRVY_YR HHX_2019 HHX_2020 WTSA_L
60 Sample Adult Longitudinal Survey year H000003 H038763 33091.61
60 Sample Adult Longitudinal Survey year H000008 H060250 15476.27
60 Sample Adult Longitudinal Survey year H000009 H018743 15994.46
60 Sample Adult Longitudinal Survey year H000010 H067593 13108.81
60 Sample Adult Longitudinal Survey year H000011 H021952 8211.29
60 Sample Adult Longitudinal Survey year H000012 H007892 40972.78
60 Sample Adult Longitudinal Survey year H000014 H041303 16721.16
60 Sample Adult Longitudinal Survey year H000023 H004538 1852.21
60 Sample Adult Longitudinal Survey year H000027 H007281 4790.91
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.