I have multiple data sets containing specific information about individuals. Ideally, I would have been able to merge the data sets. Unfortunately, I have too many duplicates, the IDs differ in each data set, and there are typos and spelling errors in the IDs that do overlap. So, I'm planning to check all the potential ID combinations to see if they exist in the different datasets. If they do, then I want a yes/no value in my "master" data.
This is essentially what I'm hoping to achieve:
data have1; input (ID ) ($); datalines; 1D 2G 3F 4C 5H ; data have2; input (ID ) ($); datalines; 1D 3F 5H 8J 9K ; data master; input (ID ID_Match) ($); datalines; 1D YES 2G NO 3F YES 4C NO 5H YES ;
I'll greatly appreciate some help!
PROC COMPARE will do what you want, or a merge will get you started as well.
EDIT: Modified per comments. This will give you exactly what you requested, a more common request, is to identify which data set the record came from, which was closer to my original answer.
data want;
merge have1 (in=t1) have2 (in=t2);
BY ID;
if t1 and t2 then ID_MATCH = "YES";
else ID_MATCH = "NO";
run;
Thank you for your help, @Reeza
I've adapted your code a bit to get it to run on EG:
data want;
merge have1 (in=t1) have2 (in=t2);
if t1 and t2 then
ID_MATCH = "YES";
else if t1 then
ID_MATCH = "T1";
else ID_MATCH = "T2";
run;
Unfortunately, the output I got when running your code looked like this:
ID | ID_Match |
1D | YES |
3F | YES |
5H | YES |
8J | YES |
9K | YES |
ID_Match should only = YES if have1 ID is also present in have2 ID. The output I'm hoping to achieve looks like this:
ID | ID_Match |
1D | YES |
2G | NO |
3F | YES |
4C | NO |
5H | YES |
@Reeza , your output is better than what I expected. Thank you!
data want;
merge have1 (in=t1) have2 (in=t2);
if t1 and t2 then
ID_MATCH = "YES";
else if t1 then
ID_MATCH = "T1";
else ID_MATCH = "T2";
BY ID;
run;
@Reeza , I also had a look at the output of PROC COMPARE. Unfortunately, it isn't obvious to me how I can use the output to join my datasets or create a dataset with columns that indicates if a key ID is present in both sets.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.