Hi SAS communities,
I am currently trying to match merge 4 data sets into one. There should only be 1 observations per person, which includes the most recent visit. The data set should include people that have data in the 3 data sets: Contact, Demog, and Vitals. Also, most of the people in those data sets have not died and will not have a matching observation in the NDI data set. Conversely, most of the people in the ‘NDI’ data set are not in our cohort (so we don’t want to include those observations). Therefore, when performing a match-merge with the ‘NDI’ data set, the observations from the ‘NDI’ data set which don’t have a matching row in the other 3 data sets should not be included.
DATA HYPANL.HYPANALYSIS2;
RETAIN SSN StateCd GenderCd EthCd RaceCd AgeAtVisit SBP DBP WtLb HypRelDeathInd;
MERGE WORK.VitalLast (IN = InVitalLast)
HypTabs.NDI (IN = InNDI)
HypTabs.Contact (IN = InContact)
HypTabs.DEMOG (IN = InDemog);
BY SSN;
IF InContact = 1
AND InVitalLast = 1
AND InDemog = 1;
/* IF LAST.SSN; */
AgeAtVisit=yrdif(BirthDt, VisitDt, 'AGE');
FORMAT BirthDt YYMMDD10.
AgeAtVisit 4.0;
/* IF InNDI = 0 THEN HypRelDeathInd = 0; */
/* IF InNDI = . THEN HypRelDeathInd = 0; */
/* ELSE IF InNDI = 0 THEN HypRelDeathInd = 0; */
/* ELSE IF InNDI = 1 THEN HypRelDeathInd = 1; */
IF HypRelDeathInd = . THEN HypRelDeathInd = 0;
/* ELSE IF HypRelDeathInd = 0 THEN HypRelDeathInd = 0; */
/* ELSE HypRelDeathInd = 1; */
FORMAT HypRelDeathInd IndVbl.
StateCd $StateCd.;
FORMAT HypRelDeathInd IndVbl. StateCd $StateCd.;
DROP VisitDt HtIn DeathDt ICD10 CODCd City Inits ZipCd EthRaceCd BirthDt;
RUN;
PROC PRINT DATA = HYPANL.HYPANALYSIS2;
RUN;
I have tried several IF THEN statements, but I still get 566 observations. However, with some IF THEN statements I get 11 observations that have a 'Yes' in the 'HypRelDeathInd' variable, and with other IF THEN statements I get 16. Either way, I still end up with 566 observations, which means there are 7 missing observations.
Thanks in advance for any help/tips!
Hi,
UT_RECORDS dataset has 7 observations which have ID with 8 digits. The ID should have typically 9 digits as it is a SSN. Here the first digit of 0 needs to be added to make it work.
This is the reason why 36-78-8134 in HypTabs.CONTACT and HypTabs.DEMOG didn't merge with 036-78-8134 in HypTabs.VITALS.
You need to add the following when you create your CONTACT_UT and DEMOG_UT datasets which you created before:
IDChar = PUT(ID, 9.);
SSN = CATX("-", SUBSTR(IDChar, 1, 3), SUBSTR(IDChar, 4, 2), SUBSTR(IDChar, 6, 4));
IF ID < 100000000 THEN SSN = "0" || SSN;
This ensures that if the ID has 8 digits, then a 0 is added as a prefix to the SSN.
Once you make this change and generate the updated CONTACT and DEMOG datasets, you should get 573 observations.
Please let me know if this doesn't work.
Thanks,
Adhuna Mukhopadhyay
Have you tried using the YDRIF statement?
AgeAtVisit = YRDIF(BirthDt, VisitDt, 'Age');
FORMAT AgeAtVisit 2.0;
YRDIF is not a statement. It is a function. Proper terminology matters in order to communicate effectively.
Also, random uppercase and lowercase letters throughout the code makes it very hard to read. Same for no indentation and no alignment.
Choose a format you like and stick to it. This will save you invaluable time. For example, language in lower case and data/user words in uppercase, like:
merge HYPTABS.DEMOG_1 (in= INDEMOG drop= ETHRACECD )
HYPTABS.CONTACT_1 (in= INCONTACT keep= SSN STATECD )
WORK.LASTVITALS (in= INLASTVITALS drop= HTIN )
HYPTABS.NDI_1 (in= INNDI drop= DEATHDT ICD10 CODCD) ;
I haven't studied your code, but note that NONE of the in dummies (InVitalLast InNDI InContact InDemog) will ever have a missing value. They will be either zero or one.
> The data set should include people that have data in the 3 data sets: Contact, Demog, and Vitals ... 8>< ... the observations from the ‘NDI’ data set which don’t have a matching row in the other 3 data sets should not be included.
Surely this clause you wrote implements all these criteria?
IF InContact = 1
AND InVitalLast = 1
AND InDemog = 1;
> I have tried several IF THEN statements, but I still get 566 observations. However, with some IF THEN statements I get 11 observations that have a 'Yes' in the 'HypRelDeathInd' variable, and with other IF THEN statements I get 16. Either way, I still end up with 566 observations, which means there are 7 missing observations.
I have no idea what you are after. Provide some sample data?
Hi,
UT_RECORDS dataset has 7 observations which have ID with 8 digits. The ID should have typically 9 digits as it is a SSN. Here the first digit of 0 needs to be added to make it work.
This is the reason why 36-78-8134 in HypTabs.CONTACT and HypTabs.DEMOG didn't merge with 036-78-8134 in HypTabs.VITALS.
You need to add the following when you create your CONTACT_UT and DEMOG_UT datasets which you created before:
IDChar = PUT(ID, 9.);
SSN = CATX("-", SUBSTR(IDChar, 1, 3), SUBSTR(IDChar, 4, 2), SUBSTR(IDChar, 6, 4));
IF ID < 100000000 THEN SSN = "0" || SSN;
This ensures that if the ID has 8 digits, then a 0 is added as a prefix to the SSN.
Once you make this change and generate the updated CONTACT and DEMOG datasets, you should get 573 observations.
Please let me know if this doesn't work.
Thanks,
Adhuna Mukhopadhyay
@adhunamukherjee With that kind of data, it's probably easier to merge numeric values.
if ID = input(compress(CHAR,,'dk'),10.);
Thank you so much! This worked!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.