04-24-2017 12:17 AM
I have an exercise for a class that requires me to merge 2 datasets and I don't think I am getting it right.
The first dataset is class.excel1 with 1166 observations and 28 variables, and second dataset is class.excel2 with 882 observation and 9 variables.
Both dataset have a common variable called "serial_number" and I am suppose to merge the 2 dataset by this variable.
class.excel1 (the bigger dataset) should contain all the serial_number that class.excel2 (the smaller dataset) has, and my goal is the to match the serial_number between the 2 datasets and only keep those observations that the serial_number matched.
Right now my code is (after sorting by serial_number):
merge class.excel1 class.excel2;
The log says:
The data set class.merge has 1476 observations and 36 variables.
However, there should be only 882 observations or less, so there must be something off and I am not sure what went wrong and how to fix it.
Any advice is appreciated! Thanks!
04-24-2017 12:22 AM
Use the IN option to make sure a record is in both datasets.
04-24-2017 12:25 AM
Thank you. But I did try that:
merge class.excel1 class.excel2 (in=frodo);
and log says I have 1166 observations and 36 variables.
Is my in option not quite right?
04-24-2017 12:30 AM
I see! thank you so much @Jagadishkatam !
So I need the in option for both datasets.
I have now 654 observations and 36 variables.
However, another classmate of mine used R to merge and he only got 6 observations that did not match.
Is there a way to print a list of observations that were not matched?
04-24-2017 12:33 AM
Rather than using IN to filter use it to create a flag.
set data1 (in=a) data2(in=b);
length flag $8.;
if a and b then flag ='MATCH';
Else if a then flag = 'A';
else if b then flag = 'B';
Need further help from the community? Please ask a new question.