SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michan22
Quartz | Level 8

Dear all,

 

I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.

The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.

Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable. 

class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.

Right now my code is (after sorting by serial_number):

data class.merge;

merge class.excel1 class.excel2;

by serial_number;

run;

 

The log says:

The data set class.merge has 1476 observations and 36 variables.

 

However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.

Any advice is appreciated! Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag

View solution in original post

10 REPLIES 10
michan22
Quartz | Level 8

Thank you. But I did try that:

data class.merge;

merge class.excel1 class.excel2 (in=frodo);

by serial_number;

if frodo;

run;

 

and log says I have 1166 observations and 36 variables.

Is my in option not quite right?

Jagadishkatam
Amethyst | Level 16
It means that there are multiple records by the same serial_number in class.excel1 due to which you are getting all of them into class.merge.
Thanks,
Jag
Jagadishkatam
Amethyst | Level 16

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag
michan22
Quartz | Level 8

I see! thank you so much @Jagadishkatam !

So I need the in option for both datasets.

 

I have now 654 observations and 36 variables.

However, another classmate of mine used R to merge and he only got 6 observations that did not match.

 

Is there a way to print a list of observations that were not matched?

much appreciated.

Reeza
Super User

Rather than using IN to filter use it to create a flag. 

 

Data want;

set data1 (in=a) data2(in=b);

length flag $8.;

if a and b then flag ='MATCH';

Else if a then flag = 'A';

else if b then flag = 'B';

run;

michan22
Quartz | Level 8

Thank you @Reeza!

Jagadishkatam
Amethyst | Level 16
If b and not a;

This will give the observation that did not match.
Thanks,
Jag
michan22
Quartz | Level 8

Thank you @Jagadishkatam! This is life changing 🙂

HB
Barite | Level 11 HB
Barite | Level 11

I would check both datasets for duplicates. You don't say anywhere that there are no duplicate records.  Duplicate records can make a mess out of matching. 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 7075 views
  • 0 likes
  • 4 in conversation