Solved: Re: How do I check if multiple IDs are present in other data sets

avz · Posted 05-02-2021 01:46 PM

I have multiple data sets containing specific information about individuals. Ideally, I would have been able to merge the data sets. Unfortunately, I have too many duplicates, the IDs differ in each data set, and there are typos and spelling errors in the IDs that do overlap. So, I'm planning to check all the potential ID combinations to see if they exist in the different datasets. If they do, then I want a yes/no value in my "master" data.

This is essentially what I'm hoping to achieve:

data have1;
	input (ID ) ($);
	datalines;
	1D                      
	2G                      
	3F                       
	4C                       
	5H                       
;
data have2;
	input (ID ) ($);
	datalines;
	1D                      
	3F                      
	5H                       
	8J                       
	9K                       
;
data master;
	input (ID ID_Match) ($);
	datalines;
	1D    YES                  
	2G    NO                  
	3F    YES                   
	4C    NO                   
	5H    YES                   
;

I'll greatly appreciate some help!

Reeza · Posted 05-04-2021 12:35 PM

Forgot the BY statement. Add BY ID.

View solution in original post

Reeza · Posted 05-02-2021 02:37 PM

PROC COMPARE will do what you want, or a merge will get you started as well.

EDIT: Modified per comments. This will give you exactly what you requested, a more common request, is to identify which data set the record came from, which was closer to my original answer.

data want;
merge have1 (in=t1) have2 (in=t2);
BY ID;

if t1 and t2 then ID_MATCH = "YES";
else ID_MATCH = "NO";

run;

avz · Posted 05-04-2021 11:22 AM

Thank you for your help, @Reeza

I've adapted your code a bit to get it to run on EG:

data want;
	merge have1 (in=t1) have2 (in=t2);
	if t1 and t2 then
		ID_MATCH = "YES";
	else if t1 then
		ID_MATCH = "T1";
	else ID_MATCH = "T2";
run;

Unfortunately, the output I got when running your code looked like this:

ID	ID_Match
1D	YES
3F	YES
5H	YES
8J	YES
9K	YES

ID_Match should only = YES if have1 ID is also present in have2 ID. The output I'm hoping to achieve looks like this:

ID	ID_Match
1D	YES
2G	NO
3F	YES
4C	NO
5H	YES

Reeza · Posted 05-04-2021 12:35 PM

Forgot the BY statement. Add BY ID.

avz · Posted 05-04-2021 04:18 PM

@Reeza , your output is better than what I expected. Thank you!

data want;
	merge have1 (in=t1) have2 (in=t2);
	if t1 and t2 then
		ID_MATCH = "YES";
	else if t1 then
		ID_MATCH = "T1";
	else ID_MATCH = "T2";
BY ID;
run;

avz · Posted 05-04-2021 11:27 AM

@Reeza , I also had a look at the output of PROC COMPARE. Unfortunately, it isn't obvious to me how I can use the output to join my datasets or create a dataset with columns that indicates if a key ID is present in both sets.

How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

Re: How do I check if multiple IDs are present in other data sets

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away