BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
avz
Obsidian | Level 7 avz
Obsidian | Level 7

I have multiple data sets containing specific information about individuals. Ideally, I would have been able to merge the data sets. Unfortunately, I have too many duplicates, the IDs differ in each data set, and there are typos and spelling errors in the IDs that do overlap. So, I'm planning to check all the potential ID combinations to see if they exist in the different datasets. If they do, then I want a yes/no value in my "master" data.

 

This is essentially what I'm hoping to achieve: 

 

data have1;
	input (ID ) ($);
	datalines;
	1D                      
	2G                      
	3F                       
	4C                       
	5H                       
;
data have2;
	input (ID ) ($);
	datalines;
	1D                      
	3F                      
	5H                       
	8J                       
	9K                       
;
data master;
	input (ID ID_Match) ($);
	datalines;
	1D    YES                  
	2G    NO                  
	3F    YES                   
	4C    NO                   
	5H    YES                   
;

I'll greatly appreciate some help! 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
5 REPLIES 5
Reeza
Super User

PROC COMPARE will do what you want, or a merge will get you started as well.

EDIT: Modified per comments. This will give you exactly what you requested, a more common request, is to identify which data set the record came from, which was closer to my original answer.

data want;
merge have1 (in=t1) have2 (in=t2);
BY ID;

if t1 and t2 then ID_MATCH = "YES";
else ID_MATCH = "NO";

run;



avz
Obsidian | Level 7 avz
Obsidian | Level 7

Thank you for your help, @Reeza 

 

I've adapted your code a bit to get it to run on EG:

 

 

data want;
	merge have1 (in=t1) have2 (in=t2);
	if t1 and t2 then
		ID_MATCH = "YES";
	else if t1 then
		ID_MATCH = "T1";
	else ID_MATCH = "T2";
run;

Unfortunately, the output I got when running your code looked like this:

 

 

IDID_Match
1DYES
3FYES
5HYES
8JYES
9KYES

 

ID_Match should only = YES if have1 ID is also present in have2 ID. The output I'm hoping to achieve looks like this:

 

IDID_Match
1DYES
2GNO
3FYES
4CNO
5HYES

 

 

Reeza
Super User
Forgot the BY statement. Add BY ID.
avz
Obsidian | Level 7 avz
Obsidian | Level 7

@Reeza , your output is better than what I expected. Thank you! 

 

data want;
	merge have1 (in=t1) have2 (in=t2);
	if t1 and t2 then
		ID_MATCH = "YES";
	else if t1 then
		ID_MATCH = "T1";
	else ID_MATCH = "T2";
BY ID;
run;
avz
Obsidian | Level 7 avz
Obsidian | Level 7

@Reeza , I also had a look at the output of PROC COMPARE. Unfortunately, it isn't obvious to me how I can use the output to join my datasets or create a dataset with columns that indicates if a key ID is present in both sets. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1469 views
  • 0 likes
  • 2 in conversation