BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michan22
Quartz | Level 8

Dear all,

 

I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.

The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.

Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable. 

class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.

Right now my code is (after sorting by serial_number):

data class.merge;

merge class.excel1 class.excel2;

by serial_number;

run;

 

The log says:

The data set class.merge has 1476 observations and 36 variables.

 

However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.

Any advice is appreciated! Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag

View solution in original post

10 REPLIES 10
michan22
Quartz | Level 8

Thank you. But I did try that:

data class.merge;

merge class.excel1 class.excel2 (in=frodo);

by serial_number;

if frodo;

run;

 

and log says I have 1166 observations and 36 variables.

Is my in option not quite right?

Jagadishkatam
Amethyst | Level 16
It means that there are multiple records by the same serial_number in class.excel1 due to which you are getting all of them into class.merge.
Thanks,
Jag
Jagadishkatam
Amethyst | Level 16

Please try

 

data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Thanks,
Jag
michan22
Quartz | Level 8

I see! thank you so much @Jagadishkatam !

So I need the in option for both datasets.

 

I have now 654 observations and 36 variables.

However, another classmate of mine used R to merge and he only got 6 observations that did not match.

 

Is there a way to print a list of observations that were not matched?

much appreciated.

Reeza
Super User

Rather than using IN to filter use it to create a flag. 

 

Data want;

set data1 (in=a) data2(in=b);

length flag $8.;

if a and b then flag ='MATCH';

Else if a then flag = 'A';

else if b then flag = 'B';

run;

michan22
Quartz | Level 8

Thank you @Reeza!

Jagadishkatam
Amethyst | Level 16
If b and not a;

This will give the observation that did not match.
Thanks,
Jag
michan22
Quartz | Level 8

Thank you @Jagadishkatam! This is life changing 🙂

HB
Barite | Level 11 HB
Barite | Level 11

I would check both datasets for duplicates. You don't say anywhere that there are no duplicate records.  Duplicate records can make a mess out of matching. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 5747 views
  • 0 likes
  • 4 in conversation