datasets match merging

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

datasets match merging

Dear all,

 

I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.

The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.

Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable. 

class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.

Right now my code is (after sorting by serial_number):

data class.merge;

merge class.excel1 class.excel2;

by serial_number;

run;

 

The log says:

The data set class.merge has 1476 observations and 36 variables.

 

However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.

Any advice is appreciated! Thanks!


Accepted Solutions
Solution
‎04-24-2017 12:31 AM
Trusted Advisor
Posts: 1,128

Re: datasets match merging

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag

View solution in original post


All Replies
Super User
Posts: 17,801

Re: datasets match merging

Contributor
Posts: 43

Re: datasets match merging

Thank you. But I did try that:

data class.merge;

merge class.excel1 class.excel2 (in=frodo);

by serial_number;

if frodo;

run;

 

and log says I have 1166 observations and 36 variables.

Is my in option not quite right?

Trusted Advisor
Posts: 1,128

Re: datasets match merging

It means that there are multiple records by the same serial_number in class.excel1 due to which you are getting all of them into class.merge.
Thanks,
Jag
Solution
‎04-24-2017 12:31 AM
Trusted Advisor
Posts: 1,128

Re: datasets match merging

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag
Contributor
Posts: 43

Re: datasets match merging

I see! thank you so much @Jagadishkatam !

So I need the in option for both datasets.

 

I have now 654 observations and 36 variables.

However, another classmate of mine used R to merge and he only got 6 observations that did not match.

 

Is there a way to print a list of observations that were not matched?

much appreciated.

Super User
Posts: 17,801

Re: datasets match merging

Rather than using IN to filter use it to create a flag. 

 

Data want;

set data1 (in=a) data2(in=b);

length flag $8.;

if a and b then flag ='MATCH';

Else if a then flag = 'A';

else if b then flag = 'B';

run;

Contributor
Posts: 43

Re: datasets match merging

Thank you @Reeza!

Trusted Advisor
Posts: 1,128

Re: datasets match merging

If b and not a;

This will give the observation that did not match.
Thanks,
Jag
Contributor
Posts: 43

Re: datasets match merging

Thank you @Jagadishkatam! This is life changing :-)

Frequent Contributor
Frequent Contributor
Posts: 89

Re: datasets match merging

I would check both datasets for duplicates. You don't say anywhere that there are no duplicate records.  Duplicate records can make a mess out of matching. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 217 views
  • 0 likes
  • 4 in conversation