I have 2 datasets - dataset A and B, as shown below. When I merge by ID, I get the resulting dataset, also shown below. What I would like to have, is for record 1 in dataset A to merge with record_ 3 in dataset B, and record 2 with record_ 4, so that record 1 doesn't go ahead and merge with record_ 4, and record_ 3 is not available for merging with record 2. This is shown below in "Desired Merged Dataset".
Thanks in advance!
Dataset A | ||||
Record | ID | Qty | ||
1 | ABC | 30 | ||
2 | ABC | 30 | ||
Dataset B | ||||
Record_ | ID | Qty_ | ||
3 | ABC | -30 | ||
4 | ABC | -30 | ||
Resulting Merged Dataset | ||||
ID | Record | Record_ | Qty | Qty_ |
ABC | 1 | 3 | 30 | -30 |
ABC | 1 | 4 | 30 | -30 |
ABC | 2 | 3 | 30 | -30 |
ABC | 2 | 4 | 30 | -30 |
Desired Merged Dataset | ||||
ID | Record | Record_ | Qty | Qty_ |
ABC | 1 | 3 | 30 | -30 |
ABC | 2 | 4 | 30 | -30 |
I guess what you did is SQL "join" instead of data step "merge", so here is the merge:
data A;
input Record ID:$ Qty;
cards;
1 ABC 30
2 ABC 30
;
Data B;
input Record_ ID:$ Qty_;
cards;
3 ABC -30
4 ABC -30
;
data c;
merge a b;
by id;
run;
proc print;run;
Haikuo
It looks like you are searching the MERGE statement without BY .
data c;
merge a b;
run;
If both tables have same amount of obs, then "merge without by" = "set; set;"
data c;
set a;
set b;
run;
Haikuo
Thanks for your reply. I agree that your suggestions would achieve the desired outcome when there are equal no. of observation per ID in the datasets. However, in my case, the dataset may not have same amount of observations for all IDs, see for example:
Dataset A | ||
Record | ID | Qty |
1 | ABC | 30 |
2 | EFG | 30 |
3 | MNO | 30 |
4 | MNO | 30 |
5 | XYZ | 30 |
6 | XYZ | 30 |
Dataset B | ||
Record_ | ID | Qty_ |
7 | ABC | -30 |
8 | ABC | -30 |
9 | MNO | -30 |
10 | PQR | -30 |
11 | XYZ | -30 |
10 | XYZ | -30 |
The desired outcome is a merged dataset with following pairs:
ID | Record | Record_ | Qty | Qty_ |
ABC | 1 | 7 | 30 | -30 |
MNO | 3 | 9 | 30 | -30 |
XYZ | 5 | 11 | 30 | -30 |
XYZ | 6 | 10 | 30 | -30 |
You need something Special . It is from Tom .
data A; input Record ID $ Qty; cards; 1 ABC 30 2 EFG 30 3 MNO 30 4 MNO 30 5 XYZ 30 6 XYZ 30 ; run; data B; input Record_ ID $ Qty_; cards; 7 ABC -30 8 ABC -30 9 MNO -30 10 PQR -30 11 XYZ -30 10 XYZ -30 ; run; data want; ina=0;inb=0; merge a(in=ina) b(in=inb); by id; if ina and inb; run;
Ksharp
Awesome! Exactly what I was looking for. Thanks, Ksharp!
Is there a Proc Sql equivalent of this?
I think it is hard for SQL. Even sql can do that, there must be lots of sql statements.
Do you have to use SQL ? Why not simple MERGE statement ?
Maybe OP needs something in Pass-through. I agree with Ksharp, it seems impossible for SQL, which is not designed for sequential process. It is already a bit unconventional approach (or as we often call it wicked for Tom's ingenious solution) even for data step merge. Here is an option I believe still within the normal range of Hash() implementation, and if I may, no sort needed.
data A;
input Record ID $ Qty;
cards;
1 ABC 30
2 EFG 30
3 MNO 30
4 MNO 30
5 XYZ 30
6 XYZ 30
;
run;
data B;
input Record_ ID $ Qty_;
cards;
7 ABC -30
8 ABC -30
9 MNO -30
10 PQR -30
11 XYZ -30
10 XYZ -30
;
run;
data want;
if _n_=1 then do;
if 0 then set b;
declare hash b(dataset:'b', multidata:'y');
b.definekey('id');
b.definedata(all:'y');
b.definedone();
end;
set a;
rc=b.find();
if rc=0 then do;
output;
rc=b.removedup();
end;
run;
Haikuo
Yes, I agree. A simple merge statement should be fine for this purpose. I anyways will need to sort the data by ID and date. Thanks again!
Hi ,
I guess this whole logic is working because of the highlighted options below.
Could you please explain the use of this ????
data want;
ina=0;inb=0;
merge a(in=ina) b(in=inb);
by id;
if ina and inb;
run;
Regards
There are a bunch of rules that you have to integrate, to see why this works. Basically, they center around how and when IN= variables get set, and how MERGE works.
1. IN= variables get set to 0 when beginning a new BY value (in this case, a new ID).
2. IN= variables get set to 1 when reading an observation from the source data set.
3. MERGE reads each observation only once. In a many-to-one MERGE, the observation gets read once from the "one" data set, and retained. It does not get re-read as additional observations get read from the other data set, for the same ID.
4. You are allowed to refer to IN= variables in your programming statements, including setting the values as is done here.
So, put these all together. Here is just one example. When A contains multiple records for an ID, but B contains only one, the software sets both IN= variables to 0 as it begins a new ID. It reads the first observation from A, and sets INA to 1. Then it reads the first (and only) observation from B, and sets inB to 1. That first observation passes the subsetting IF test, and gets output.
Next, the programming statements set both IN= variables to 0. The software reads the second observation from A, and sets INA to 1. It reads nothing from B, since there is no second observation for the current ID. So INB remains 0. That observation does not pass the subsetting IF test, and gets deleted.
Hope this is all clear. The programming looks simple, but it takes advantage of a complex process.
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.