- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.
The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.
Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable.
class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.
Right now my code is (after sorting by serial_number):
data class.merge;
merge class.excel1 class.excel2;
by serial_number;
run;
The log says:
The data set class.merge has 1476 observations and 36 variables.
However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.
Any advice is appreciated! Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please try
data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use the IN option to make sure a record is in both datasets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. But I did try that:
data class.merge;
merge class.excel1 class.excel2 (in=frodo);
by serial_number;
if frodo;
run;
and log says I have 1166 observations and 36 variables.
Is my in option not quite right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please try
data class.merge;
merge class.excel1(in=a) class.excel2(in=b);
by serial_number;
if a and b;
run;
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see! thank you so much @Jagadishkatam !
So I need the in option for both datasets.
I have now 654 observations and 36 variables.
However, another classmate of mine used R to merge and he only got 6 observations that did not match.
Is there a way to print a list of observations that were not matched?
much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Rather than using IN to filter use it to create a flag.
Data want;
set data1 (in=a) data2(in=b);
length flag $8.;
if a and b then flag ='MATCH';
Else if a then flag = 'A';
else if b then flag = 'B';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @Reeza!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This will give the observation that did not match.
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @Jagadishkatam! This is life changing 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would check both datasets for duplicates. You don't say anywhere that there are no duplicate records. Duplicate records can make a mess out of matching.