11-11-2009 02:14 PM
11-11-2009 03:28 PM
11-11-2009 03:32 PM
09-27-2012 04:03 PM
I have 2 datasets - dataset A and B, as shown below. When I merge by ID, I get the resulting dataset, also shown below. What I would like to have, is for record 1 in dataset A to merge with record_ 3 in dataset B, and record 2 with record_ 4, so that record 1 doesn't go ahead and merge with record_ 4, and record_ 3 is not available for merging with record 2. This is shown below in "Desired Merged Dataset".
Thanks in advance!
|Resulting Merged Dataset|
|Desired Merged Dataset|
09-27-2012 07:04 PM
I guess what you did is SQL "join" instead of data step "merge", so here is the merge:
input Record ID:$ Qty;
1 ABC 30
2 ABC 30
input Record_ ID:$ Qty_;
3 ABC -30
4 ABC -30
merge a b;
10-01-2012 03:15 AM
Thanks for your reply. I agree that your suggestions would achieve the desired outcome when there are equal no. of observation per ID in the datasets. However, in my case, the dataset may not have same amount of observations for all IDs, see for example:
The desired outcome is a merged dataset with following pairs:
10-01-2012 09:58 AM
You need something Special . It is from Tom .
data A; input Record ID $ Qty; cards; 1 ABC 30 2 EFG 30 3 MNO 30 4 MNO 30 5 XYZ 30 6 XYZ 30 ; run; data B; input Record_ ID $ Qty_; cards; 7 ABC -30 8 ABC -30 9 MNO -30 10 PQR -30 11 XYZ -30 10 XYZ -30 ; run; data want; ina=0;inb=0; merge a(in=ina) b(in=inb); by id; if ina and inb; run;
10-02-2012 02:58 AM
I think it is hard for SQL. Even sql can do that, there must be lots of sql statements.
Do you have to use SQL ? Why not simple MERGE statement ?
10-02-2012 09:11 AM
Maybe OP needs something in Pass-through. I agree with Ksharp, it seems impossible for SQL, which is not designed for sequential process. It is already a bit unconventional approach (or as we often call it wicked for Tom's ingenious solution) even for data step merge. Here is an option I believe still within the normal range of Hash() implementation, and if I may, no sort needed.
input Record ID $ Qty;
1 ABC 30
2 EFG 30
3 MNO 30
4 MNO 30
5 XYZ 30
6 XYZ 30
input Record_ ID $ Qty_;
7 ABC -30
8 ABC -30
9 MNO -30
10 PQR -30
11 XYZ -30
10 XYZ -30
if _n_=1 then do;
if 0 then set b;
declare hash b(dataset:'b', multidata:'y');
if rc=0 then do;
10-04-2012 08:07 AM
Yes, I agree. A simple merge statement should be fine for this purpose. I anyways will need to sort the data by ID and date. Thanks again!
11-12-2012 10:35 AM
I guess this whole logic is working because of the highlighted options below.
Could you please explain the use of this ????
merge a(in=ina) b(in=inb);
if ina and inb;
11-12-2012 11:05 AM
There are a bunch of rules that you have to integrate, to see why this works. Basically, they center around how and when IN= variables get set, and how MERGE works.
1. IN= variables get set to 0 when beginning a new BY value (in this case, a new ID).
2. IN= variables get set to 1 when reading an observation from the source data set.
3. MERGE reads each observation only once. In a many-to-one MERGE, the observation gets read once from the "one" data set, and retained. It does not get re-read as additional observations get read from the other data set, for the same ID.
4. You are allowed to refer to IN= variables in your programming statements, including setting the values as is done here.
So, put these all together. Here is just one example. When A contains multiple records for an ID, but B contains only one, the software sets both IN= variables to 0 as it begins a new ID. It reads the first observation from A, and sets INA to 1. Then it reads the first (and only) observation from B, and sets inB to 1. That first observation passes the subsetting IF test, and gets output.
Next, the programming statements set both IN= variables to 0. The software reads the second observation from A, and sets INA to 1. It reads nothing from B, since there is no second observation for the current ID. So INB remains 0. That observation does not pass the subsetting IF test, and gets deleted.
Hope this is all clear. The programming looks simple, but it takes advantage of a complex process.