Re: One-to-one merge and then force out

Sas_Geek · Posted 11-11-2009 02:14 PM

I have two datasets A and B.

Multiple records from A could merge to same record in B.

I want to merge A with B such that if the first record from A merges with first record from B, I want to make the first record from B unavailable for merging with second record from A, even if it can merge.

For example, lets say A has 2 records (1 & 2) and B has 2 records (3 & 4). Based on the variables that I'm merging on, both the records from A can merge with both the records from B.

I want to merge 1 with 3 and then make 3 unavilable (or force out of the database) so that 2 can merge with 4.

I've resolved the issue using lag approach, however, I'm interested in using merge approach.

Please share your ideas.

abdullala · Posted 11-11-2009 03:28 PM

from the simple data example you provided, seems that you may use line-by-line merge, which means line 1 in A merge with line1 in B, line 2 in A merge with line 2 in B, etc. I will not recommend this though, since data could be much more complicated. the more robust way, I think, may be adding more variable as key for merging. For example, you may create a count variable to record No. 1, 2, 3 record in A under your merging by variable, and do same to B, and add this count variable to the merging by variable list.

Flip · Posted 11-11-2009 03:32 PM

I think that you will find that several of us are having trouble figuring out exactly what you are asking. Perhaps a more robust example would help.

Sas_Geek · Posted 09-27-2012 04:03 PM

I have 2 datasets - dataset A and B, as shown below. When I merge by ID, I get the resulting dataset, also shown below. What I would like to have, is for record 1 in dataset A to merge with record_ 3 in dataset B, and record 2 with record_ 4, so that record 1 doesn't go ahead and merge with record_ 4, and record_ 3 is not available for merging with record 2. This is shown below in "Desired Merged Dataset".

Thanks in advance!

Dataset A
Record	ID	Qty
1	ABC	30
2	ABC	30

Dataset B
Record_	ID	Qty_
3	ABC	-30
4	ABC	-30


Resulting Merged Dataset
ID	Record	Record_	Qty	Qty_
ABC	1	3	30	-30
ABC	1	4	30	-30
ABC	2	3	30	-30
ABC	2	4	30	-30


Desired Merged Dataset
ID	Record	Record_	Qty	Qty_
ABC	1	3	30	-30
ABC	2	4	30	-30

Haikuo · Posted 09-27-2012 07:04 PM

I guess what you did is SQL "join" instead of data step "merge", so here is the merge:

data A;

input Record ID:$ Qty;

cards;

1 ABC 30

2 ABC 30

;

Data B;

input Record_ ID:$ Qty_;

cards;

3 ABC -30

4 ABC -30

;

data c;

merge a b;

by id;

run;

proc print;run;

Haikuo

Ksharp · Posted 09-27-2012 10:28 PM

It looks like you are searching the MERGE statement without BY .

data c;

merge a b;

run;

Haikuo · Posted 09-27-2012 10:57 PM

If both tables have same amount of obs, then "merge without by" = "set; set;"

data c;

set a;

set b;

run;

Haikuo

Sas_Geek · Posted 10-01-2012 03:15 AM

Thanks for your reply. I agree that your suggestions would achieve the desired outcome when there are equal no. of observation per ID in the datasets. However, in my case, the dataset may not have same amount of observations for all IDs, see for example:

Dataset A
Record	ID	Qty
1	ABC	30
2	EFG	30
3	MNO	30
4	MNO	30
5	XYZ	30
6	XYZ	30

Dataset B
Record_	ID	Qty_
7	ABC	-30
8	ABC	-30
9	MNO	-30
10	PQR	-30
11	XYZ	-30
10	XYZ	-30

The desired outcome is a merged dataset with following pairs:

ID	Record	Record_	Qty	Qty_
ABC	1	7	30	-30
MNO	3	9	30	-30
XYZ	5	11	30	-30
XYZ	6	10	30	-30

Ksharp · Posted 10-01-2012 09:58 AM

You need something Special . It is from Tom .

data A;
input Record     ID     $ Qty;
cards;
1     ABC     30
2     EFG     30
3     MNO     30
4     MNO     30
5     XYZ     30
6     XYZ     30
;
run;
          
data B;
input Record_     ID $     Qty_;
cards;
7     ABC     -30
8     ABC     -30
9     MNO     -30
10     PQR     -30
11     XYZ     -30
10     XYZ     -30
;
run;
data want;
ina=0;inb=0;
 merge a(in=ina) b(in=inb);
 by id;
 if ina and inb;
run;

Ksharp

Sas_Geek · Posted 10-01-2012 12:52 PM

Awesome! Exactly what I was looking for. Thanks, Ksharp!

Is there a Proc Sql equivalent of this?

Ksharp · Posted 10-02-2012 02:58 AM

I think it is hard for SQL. Even sql can do that, there must be lots of sql statements.

Do you have to use SQL ? Why not simple MERGE statement ?

Haikuo · Posted 10-02-2012 09:11 AM

Maybe OP needs something in Pass-through. I agree with Ksharp, it seems impossible for SQL, which is not designed for sequential process. It is already a bit unconventional approach (or as we often call it wicked for Tom's ingenious solution) even for data step merge. Here is an option I believe still within the normal range of Hash() implementation, and if I may, no sort needed.

data A;

input Record ID $ Qty;

cards;

1 ABC 30

2 EFG 30

3 MNO 30

4 MNO 30

5 XYZ 30

6 XYZ 30

;

run;

data B;

input Record_ ID $ Qty_;

cards;

7 ABC -30

8 ABC -30

9 MNO -30

10 PQR -30

11 XYZ -30

10 XYZ -30

;

run;

data want;

if _n_=1 then do;

if 0 then set b;

declare hash b(dataset:'b', multidata:'y');

b.definekey('id');

b.definedata(all:'y');

b.definedone();

end;

set a;

rc=b.find();

if rc=0 then do;

output;

rc=b.removedup();

end;

run;

Haikuo

Sas_Geek · Posted 10-04-2012 08:07 AM

Yes, I agree. A simple merge statement should be fine for this purpose. I anyways will need to sort the data by ID and date. Thanks again!

robertrao · Posted 11-12-2012 10:35 AM

Hi ,

I guess this whole logic is working because of the highlighted options below.

Could you please explain the use of this ????

data want;

ina=0;inb=0;

merge a(in=ina) b(in=inb);

by id;

if ina and inb;

run;

Regards

Astounding · Posted 11-12-2012 11:05 AM

There are a bunch of rules that you have to integrate, to see why this works. Basically, they center around how and when IN= variables get set, and how MERGE works.

1. IN= variables get set to 0 when beginning a new BY value (in this case, a new ID).

2. IN= variables get set to 1 when reading an observation from the source data set.

3. MERGE reads each observation only once. In a many-to-one MERGE, the observation gets read once from the "one" data set, and retained. It does not get re-read as additional observations get read from the other data set, for the same ID.

4. You are allowed to refer to IN= variables in your programming statements, including setting the values as is done here.

So, put these all together. Here is just one example. When A contains multiple records for an ID, but B contains only one, the software sets both IN= variables to 0 as it begins a new ID. It reads the first observation from A, and sets INA to 1. Then it reads the first (and only) observation from B, and sets inB to 1. That first observation passes the subsetting IF test, and gets output.

Next, the programming statements set both IN= variables to 0. The software reads the second observation from A, and sets INA to 1. It reads nothing from B, since there is no second observation for the current ID. So INB remains 0. That observation does not pass the subsetting IF test, and gets deleted.

Hope this is all clear. The programming looks simple, but it takes advantage of a complex process.

Good luck.

2025 SAS Hackathon: There is still time!

SAS Training: Just a Click Away