Dear all,
I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.
The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.
Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable.
class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.
Right now my code is (after sorting by serial_number):
data class.merge;
merge class.excel1 class.excel2;
by serial_number;
run;
The log says:
The data set class.merge has 1476 observations and 36 variables.
However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.
Any advice is appreciated! Thanks!
Please try
data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Use the IN option to make sure a record is in both datasets.
Thank you. But I did try that:
data class.merge;
merge class.excel1 class.excel2 (in=frodo);
by serial_number;
if frodo;
run;
and log says I have 1166 observations and 36 variables.
Is my in option not quite right?
Please try
data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
I see! thank you so much @Jagadishkatam !
So I need the in option for both datasets.
I have now 654 observations and 36 variables.
However, another classmate of mine used R to merge and he only got 6 observations that did not match.
Is there a way to print a list of observations that were not matched?
much appreciated.
Rather than using IN to filter use it to create a flag.
Data want;
set data1 (in=a) data2(in=b);
length flag $8.;
if a and b then flag ='MATCH';
Else if a then flag = 'A';
else if b then flag = 'B';
run;
Thank you @Reeza!
Thank you @Jagadishkatam! This is life changing 🙂
I would check both datasets for duplicates. You don't say anywhere that there are no duplicate records. Duplicate records can make a mess out of matching.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.