Help using Base SAS procedures

One-to-one merge and then force out

Reply
Occasional Contributor
Posts: 13

One-to-one merge and then force out

I have two datasets A and B.

Multiple records from A could merge to same record in B.

I want to merge A with B such that if the first record from A merges with first record from B, I want to make the first record from B unavailable for merging with second record from A, even if it can merge.

For example, lets say A has 2 records (1 & 2) and B has 2 records (3 & 4). Based on the variables that I'm merging on, both the records from A can merge with both the records from B.

I want to merge 1 with 3 and then make 3 unavilable (or force out of the database) so that 2 can merge with 4.

I've resolved the issue using lag approach, however, I'm interested in using merge approach.

Please share your ideas.
Contributor
Posts: 74

Re: One-to-one merge and then force out

from the simple data example you provided, seems that you may use line-by-line merge, which means line 1 in A merge with line1 in B, line 2 in A merge with line 2 in B, etc. I will not recommend this though, since data could be much more complicated. the more robust way, I think, may be adding more variable as key for merging. For example, you may create a count variable to record No. 1, 2, 3 record in A under your merging by variable, and do same to B, and add this count variable to the merging by variable list.
Super Contributor
Posts: 359

Re: One-to-one merge and then force out

I think that you will find that several of us are having trouble figuring out exactly what you are asking. Perhaps a more robust example would help.
Occasional Contributor
Posts: 13

Re: One-to-one merge and then force out

I have 2 datasets - dataset A and B, as shown below. When I merge by ID, I get the resulting dataset, also shown below. What I would like to have, is for record 1 in dataset A to merge with record_ 3 in dataset B, and record 2 with record_ 4, so that record 1 doesn't go ahead and merge with record_ 4, and record_ 3 is not available for merging with record 2. This is shown below in "Desired Merged Dataset".

Thanks in advance!

Dataset A
RecordIDQty
1ABC30
2ABC30
Dataset B
Record_IDQty_
3ABC-30
4ABC-30
Resulting Merged Dataset
IDRecordRecord_QtyQty_
ABC1330-30
ABC1430-30
ABC2330-30
ABC2430-30
Desired Merged Dataset
IDRecordRecord_QtyQty_
ABC1330-30
ABC2430-30
Respected Advisor
Posts: 3,156

Re: One-to-one merge and then force out

I guess what you did is SQL "join" instead of data step "merge", so here is the merge:

data A;

input Record    ID:$    Qty;

cards;   

1    ABC    30       

2    ABC    30       

;               

Data B;       

input Record_    ID:$    Qty_;

cards;   

3    ABC    -30       

4    ABC    -30

;

data c;

merge a b;

by id;

run;

proc print;run;

Haikuo

Super User
Posts: 10,028

Re: One-to-one merge and then force out

It looks like you are searching the MERGE statement without BY .

data c;

merge a b;

run;

Respected Advisor
Posts: 3,156

Re: One-to-one merge and then force out

If both tables have same amount of obs, then "merge without by" = "set; set;"

data c;

set a;

set b;

run;

Haikuo

Occasional Contributor
Posts: 13

Re: One-to-one merge and then force out

Thanks for your reply. I agree that your suggestions would achieve the desired outcome when there are equal no. of observation per ID in the datasets. However, in my case, the dataset may not have same amount of observations for all IDs, see for example:

Dataset A
RecordIDQty
1ABC30
2EFG30
3MNO30
4MNO30
5XYZ30
6XYZ30
Dataset B
Record_IDQty_
7ABC-30
8ABC-30
9MNO-30
10PQR-30
11XYZ-30
10XYZ-30

The desired outcome is a merged dataset with following pairs:

IDRecordRecord_QtyQty_
ABC1730-30
MNO3930-30
XYZ51130-30
XYZ61030-30
Super User
Posts: 10,028

Re: One-to-one merge and then force out

You need something Special . It is from Tom .

data A;
input Record     ID     $ Qty;
cards;
1     ABC     30
2     EFG     30
3     MNO     30
4     MNO     30
5     XYZ     30
6     XYZ     30
;
run;
          
data B;
input Record_     ID $     Qty_;
cards;
7     ABC     -30
8     ABC     -30
9     MNO     -30
10     PQR     -30
11     XYZ     -30
10     XYZ     -30
;
run;
data want;
ina=0;inb=0;
 merge a(in=ina) b(in=inb);
 by id;
 if ina and inb;
run;

Ksharp

Occasional Contributor
Posts: 13

Re: One-to-one merge and then force out

Awesome! Exactly what I was looking for. Thanks, Ksharp!

Is there a Proc Sql equivalent of this?

Super User
Posts: 10,028

Re: One-to-one merge and then force out

I think it is hard for SQL. Even sql can do that, there must be lots of sql statements.

Do you have to use SQL ? Why not simple MERGE statement ?

Respected Advisor
Posts: 3,156

Re: One-to-one merge and then force out

Maybe OP needs something in Pass-through. I agree with Ksharp, it seems impossible for SQL, which is not designed for sequential process. It is already a bit unconventional approach (or as we often call it wicked Smiley Happy for Tom's ingenious solution) even for data step merge. Here is an option I believe still within the normal range of Hash() implementation, and if I may, no sort needed.

data A;

input Record     ID     $ Qty;

cards;

1     ABC     30

2     EFG     30

3     MNO     30

4     MNO     30

5     XYZ     30

6     XYZ     30

;

run;

         

data B;

input Record_     ID $     Qty_;

cards;

7     ABC     -30

8     ABC     -30

9     MNO     -30

10     PQR     -30

11     XYZ     -30

10     XYZ     -30

;

run;

data want;

  if _n_=1 then do;

     if 0 then set b;

        declare hash b(dataset:'b', multidata:'y');

        b.definekey('id');

        b.definedata(all:'y');

        b.definedone();

  end;

  set a;

  rc=b.find();

  if rc=0 then do;

     output;

       rc=b.removedup();

  end;

  run;

Haikuo

Occasional Contributor
Posts: 13

Re: One-to-one merge and then force out

Yes, I agree. A simple merge statement should be fine for this purpose. I anyways will need to sort the data by ID and date. Thanks again!

Super Contributor
Posts: 1,041

Re: One-to-one merge and then force out

Hi ,

I guess this whole logic is working because of the highlighted options below.

Could you please explain the use of this ????

data want;

ina=0;inb=0;

merge a(in=ina) b(in=inb);

by id;

if ina and inb;

run;

Regards

Super User
Posts: 5,505

Re: One-to-one merge and then force out

Posted in reply to robertrao

There are a bunch of rules that you have to integrate, to see why this works.  Basically, they center around how and when IN= variables get set, and how MERGE works.

1. IN= variables get set to 0 when beginning a new BY value (in this case, a new ID).

2. IN= variables get set to 1 when reading an observation from the source data set.

3. MERGE reads each observation only once.  In a many-to-one MERGE, the observation gets read once from the "one" data set, and retained.  It does not get re-read as additional observations get read from the other data set, for the same ID.

4. You are allowed to refer to IN= variables in your programming statements, including setting the values as is done here.

So, put these all together.  Here is just one example.  When A contains multiple records for an ID, but B contains only one, the software sets both IN= variables to 0 as it begins a new ID.  It reads the first observation from A, and sets INA to 1.  Then it reads the first (and only) observation from B, and sets inB to 1.  That first observation passes the subsetting IF test, and gets output.

Next, the programming statements set both IN= variables to 0.  The software reads the second observation from A, and sets INA to 1.  It reads nothing from B, since there is no second observation for the current ID.  So INB remains 0.  That observation does not pass the subsetting IF test, and gets deleted.

Hope this is all clear.  The programming looks simple, but it takes advantage of a complex process.

Good luck.

Ask a Question
Discussion stats
  • 16 replies
  • 784 views
  • 6 likes
  • 7 in conversation