Hello,
I am trying to merge two datasets based on a numeric Id and at the same time creating a match flag for them. I need all records from x1 data whereas only those records that match from x2. The code I am using is shown below.
data matches nonmatches;
merge x1(in=a) x2(in=b);
by id;
if a and b then id_flag = 'Y';
if a then output matches;
else output nonmatches;
run;
While I try to look up the resulting nonmatches again in the datasets, I see a few ids that match but have not been merged or flagged. Could someone please tell me where I am going wrong or any other way to do this.
Thank you in advance.
Example data?
Do both of the data sets have multiple records with the same value of Id? If so you'll only get matches for the set with the lowest number of duplicates.
It will help to show a small set of data from BOTH sets and what you expect for a result. You need not include all of the other variables than Id but 1 or 2 will help show which record goes with which.
Thank you for the response.
Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.
X1
Id | Name | Address |
12 | ABC | 23 N W Ave |
23 | XYZ | 1022 Rue De |
34 | PRAX | 1 Mclean Street |
45 | AFHT | 509 Rochester Ave |
X2
Id | ManufactureId | Status |
12 | 290498 | Pending |
12 | 290498 |
|
23 | 48 | Active |
23 | 48 | Pending |
45 | 2455 |
|
Posting HTML tables of data is better than photographs of data. But posting actual data step code that creates the dataset is much better for sharing with others.
data x1 ;
length id 8 name $20 address $50 ;
infile cards dsd dlm='|' truncover;
input id name address;
cards;
12|ABC|23 N W Ave
23|XYZ|1022 Rue De
34|PRAX|1 Mclean Street
45|AFHT|509 Rochester Ave
;
data x2 ;
length id 8 manufactureid 8 status $10 ;
infile cards dsd dlm='|' truncover;
input id manufactureid status;
cards;
12|290498|Pending
12|290498|.
23|48|Active
23|48|Pending
45|2455|.
;
Here is what SAS will do if you merge by ID. I saved the IN= variable to permanent variables so we can see their status in the saved dataset.
data want;
merge x1(in=a) x2(in=b);
by id;
in_x1=a;
in_x2=b;
run;
Which rows do you want to output to which of your two tables?
Obs id name address manufactureid status in_x1 in_x2 1 12 ABC 23 N W Ave 290498 Pending 1 1 2 12 ABC 23 N W Ave 290498 1 1 3 23 XYZ 1022 Rue De 48 Active 1 1 4 23 XYZ 1022 Rue De 48 Pending 1 1 5 34 PRAX 1 Mclean Street . 1 0 6 45 AFHT 509 Rochester Ave 2455 1 1
Can you supply some example data where you are noticing the problem you described. A solution will be dependent upon whether this is a one to one, one to many, or many to many merge.
Art, CEO, AnalystFinder.com
Thank you for the response.
Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.
X1
Id | Name | Address |
12 | ABC | 23 N W Ave |
23 | XYZ | 1022 Rue De |
34 | PRAX | 1 Mclean Street |
45 | AFHT | 509 Rochester Ave |
X2
Id | ManufactureId | Status |
12 | 290498 | Pending |
12 | 290498 |
|
23 | 48 | Active |
23 | 48 | Pending |
45 | 2455 |
|
and what do you expect the two output datasets to look like? Your code appears to be functioning correctly.
Art, CEO, AnalystFinder.com
The output should be X1 with X2 data merged columns. If there are duplicates due to 'Status' column in X2, i can deal with it later.
Output
Id | Name | Address | ManufactureId | Status |
12 | ABC | 23 N W Ave | 290498 | Pending |
23 | XYZ | 1022 Rue De | 48 | Active |
34 | PRAX | 1 Mclean Street |
|
|
45 | AFHT | 509 Rochester Ave | 2455 |
|
Why did you pick STATUS=ACTIVE for ID=23? It also has a record where STATUS=PENDING.
How should the program know which record to use?
and, in addition to @Tom's question, what do you expect/want to see in the nonmatches dataset?
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.