How can I merge two dataset based on matching between two variables

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

How can I merge two dataset based on matching between two variables

I want to merge datasets based on matching two variables. So for example, if an observation has both a SUBJECTID AND a PROGRAM matching then merge.

 

How can I accomplish this?

 

 


Accepted Solutions
Solution
‎12-06-2016 02:40 PM
Super User
Posts: 11,343

Re: How can I merge two dataset based on matching between two variables

Posted in reply to confused_saser

Perhaps this is what you are looking for:

proc sort data=data1; by subjectid program;run;
proc sort data=data2; by subjectid program;run;

data want;
   merge data1  (in=indata1)
         data2  (in=indata2)
   ;
   by subjectid program;
   if indata1 and indata2;
run;

The dataset option IN creates a temporary named variable (the bit after the =) that has a value of 1 (or true) if the current record is contributed to by that dataset. If both of the in varaibles are true then the value of subjectid and program match.

 

 

NOTE: if you have multiple values of the combination of subjectid and program then this approach may not be what you want. The SAS procedure PROC SQL provides for more complex repeated matches.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: How can I merge two dataset based on matching between two variables

Posted in reply to confused_saser

Depending on exactly what you are wanting there are a number of ways possible. It would be best to post a few rows of example data from each dataset, 4 or 5 should do, and what the desired result would look like for the example data. Also if you have a possibilit of no match describe show how one of those cases would be handled.

Contributor
Posts: 37

Re: How can I merge two dataset based on matching between two variables

Here is an example dataset:

 

DATA data1; 
  INPUT subjectID program $ name $ ; 
CARDS; 
545454 A  Michael 
656565 B  Sara 
545454 C  Josh 
545454 D  Jessica

; 
RUN; 

DATA data2; 
  INPUT subjectID program $ height ; 
CARDS; 
545454 A  163
656565 B  120 
545454 C  20 
545454 D  256
545454 .  123
545452 A  456
; 
RUN; 

And this is the output when I merge in the attached.

 

But line 4 and 5 are incorect. What I want to happen is to only merge the observations if both SUBJECTID and PROGRAM match. So what line 4 and 5 should be is

 

 

545454 D  Jessica 256

 

Does that make sense?

 

I want it to merge only if subjectid and program match in the datasets, if they don't match then just leave the fields blank

 


output.png
Solution
‎12-06-2016 02:40 PM
Super User
Posts: 11,343

Re: How can I merge two dataset based on matching between two variables

Posted in reply to confused_saser

Perhaps this is what you are looking for:

proc sort data=data1; by subjectid program;run;
proc sort data=data2; by subjectid program;run;

data want;
   merge data1  (in=indata1)
         data2  (in=indata2)
   ;
   by subjectid program;
   if indata1 and indata2;
run;

The dataset option IN creates a temporary named variable (the bit after the =) that has a value of 1 (or true) if the current record is contributed to by that dataset. If both of the in varaibles are true then the value of subjectid and program match.

 

 

NOTE: if you have multiple values of the combination of subjectid and program then this approach may not be what you want. The SAS procedure PROC SQL provides for more complex repeated matches.

Contributor
Posts: 37

Re: How can I merge two dataset based on matching between two variables

Thanks @ballardw.

 

How can I also spit out a dataset that contains all the observations that did not merge?

 

I tried something like this but I don't think I understand the syntax enough:

 

data want;
   merge data1  (in=indata1)
         data2  (in=indata2)
   ;
   by subjectid program;
   if indata1 and indata2 then output mergeddata;
   else if then output nomerge;
run;
Contributor
Posts: 37

Re: How can I merge two dataset based on matching between two variables

Posted in reply to confused_saser

Nevermind! I got it! 

 

data want nomerge;
   merge data1  (in=indata1)
         data2  (in=indata2)
   ;
   by subjectid program;
   if indata1 and indata2 then output want;
   else if indata2 then output nomerge;
run;

Thank you!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 236 views
  • 3 likes
  • 2 in conversation