BookmarkSubscribeRSS Feed
Sas_Geek
Calcite | Level 5
I have two datasets A and B.

Multiple records from A could merge to same record in B.

I want to merge A with B such that if the first record from A merges with first record from B, I want to make the first record from B unavailable for merging with second record from A, even if it can merge.

For example, lets say A has 2 records (1 & 2) and B has 2 records (3 & 4). Based on the variables that I'm merging on, both the records from A can merge with both the records from B.

I want to merge 1 with 3 and then make 3 unavilable (or force out of the database) so that 2 can merge with 4.

I've resolved the issue using lag approach, however, I'm interested in using merge approach.

Please share your ideas.
16 REPLIES 16
abdullala
Calcite | Level 5
from the simple data example you provided, seems that you may use line-by-line merge, which means line 1 in A merge with line1 in B, line 2 in A merge with line 2 in B, etc. I will not recommend this though, since data could be much more complicated. the more robust way, I think, may be adding more variable as key for merging. For example, you may create a count variable to record No. 1, 2, 3 record in A under your merging by variable, and do same to B, and add this count variable to the merging by variable list.
Flip
Fluorite | Level 6
I think that you will find that several of us are having trouble figuring out exactly what you are asking. Perhaps a more robust example would help.
Sas_Geek
Calcite | Level 5

I have 2 datasets - dataset A and B, as shown below. When I merge by ID, I get the resulting dataset, also shown below. What I would like to have, is for record 1 in dataset A to merge with record_ 3 in dataset B, and record 2 with record_ 4, so that record 1 doesn't go ahead and merge with record_ 4, and record_ 3 is not available for merging with record 2. This is shown below in "Desired Merged Dataset".

Thanks in advance!

Dataset A
RecordIDQty
1ABC30
2ABC30
Dataset B
Record_IDQty_
3ABC-30
4ABC-30
Resulting Merged Dataset
IDRecordRecord_QtyQty_
ABC1330-30
ABC1430-30
ABC2330-30
ABC2430-30
Desired Merged Dataset
IDRecordRecord_QtyQty_
ABC1330-30
ABC2430-30
Haikuo
Onyx | Level 15

I guess what you did is SQL "join" instead of data step "merge", so here is the merge:

data A;

input Record    ID:$    Qty;

cards;   

1    ABC    30       

2    ABC    30       

;               

Data B;       

input Record_    ID:$    Qty_;

cards;   

3    ABC    -30       

4    ABC    -30

;

data c;

merge a b;

by id;

run;

proc print;run;

Haikuo

Ksharp
Super User

It looks like you are searching the MERGE statement without BY .

data c;

merge a b;

run;

Haikuo
Onyx | Level 15

If both tables have same amount of obs, then "merge without by" = "set; set;"

data c;

set a;

set b;

run;

Haikuo

Sas_Geek
Calcite | Level 5

Thanks for your reply. I agree that your suggestions would achieve the desired outcome when there are equal no. of observation per ID in the datasets. However, in my case, the dataset may not have same amount of observations for all IDs, see for example:

Dataset A
RecordIDQty
1ABC30
2EFG30
3MNO30
4MNO30
5XYZ30
6XYZ30
Dataset B
Record_IDQty_
7ABC-30
8ABC-30
9MNO-30
10PQR-30
11XYZ-30
10XYZ-30

The desired outcome is a merged dataset with following pairs:

IDRecordRecord_QtyQty_
ABC1730-30
MNO3930-30
XYZ51130-30
XYZ61030-30
Ksharp
Super User

You need something Special . It is from Tom .

data A;
input Record     ID     $ Qty;
cards;
1     ABC     30
2     EFG     30
3     MNO     30
4     MNO     30
5     XYZ     30
6     XYZ     30
;
run;
          
data B;
input Record_     ID $     Qty_;
cards;
7     ABC     -30
8     ABC     -30
9     MNO     -30
10     PQR     -30
11     XYZ     -30
10     XYZ     -30
;
run;
data want;
ina=0;inb=0;
 merge a(in=ina) b(in=inb);
 by id;
 if ina and inb;
run;

Ksharp

Sas_Geek
Calcite | Level 5

Awesome! Exactly what I was looking for. Thanks, Ksharp!

Is there a Proc Sql equivalent of this?

Ksharp
Super User

I think it is hard for SQL. Even sql can do that, there must be lots of sql statements.

Do you have to use SQL ? Why not simple MERGE statement ?

Haikuo
Onyx | Level 15

Maybe OP needs something in Pass-through. I agree with Ksharp, it seems impossible for SQL, which is not designed for sequential process. It is already a bit unconventional approach (or as we often call it wicked Smiley Happy for Tom's ingenious solution) even for data step merge. Here is an option I believe still within the normal range of Hash() implementation, and if I may, no sort needed.

data A;

input Record     ID     $ Qty;

cards;

1     ABC     30

2     EFG     30

3     MNO     30

4     MNO     30

5     XYZ     30

6     XYZ     30

;

run;

         

data B;

input Record_     ID $     Qty_;

cards;

7     ABC     -30

8     ABC     -30

9     MNO     -30

10     PQR     -30

11     XYZ     -30

10     XYZ     -30

;

run;

data want;

  if _n_=1 then do;

     if 0 then set b;

        declare hash b(dataset:'b', multidata:'y');

        b.definekey('id');

        b.definedata(all:'y');

        b.definedone();

  end;

  set a;

  rc=b.find();

  if rc=0 then do;

     output;

       rc=b.removedup();

  end;

  run;

Haikuo

Sas_Geek
Calcite | Level 5

Yes, I agree. A simple merge statement should be fine for this purpose. I anyways will need to sort the data by ID and date. Thanks again!

robertrao
Quartz | Level 8

Hi ,

I guess this whole logic is working because of the highlighted options below.

Could you please explain the use of this ????

data want;

ina=0;inb=0;

merge a(in=ina) b(in=inb);

by id;

if ina and inb;

run;

Regards

Astounding
PROC Star

There are a bunch of rules that you have to integrate, to see why this works.  Basically, they center around how and when IN= variables get set, and how MERGE works.

1. IN= variables get set to 0 when beginning a new BY value (in this case, a new ID).

2. IN= variables get set to 1 when reading an observation from the source data set.

3. MERGE reads each observation only once.  In a many-to-one MERGE, the observation gets read once from the "one" data set, and retained.  It does not get re-read as additional observations get read from the other data set, for the same ID.

4. You are allowed to refer to IN= variables in your programming statements, including setting the values as is done here.

So, put these all together.  Here is just one example.  When A contains multiple records for an ID, but B contains only one, the software sets both IN= variables to 0 as it begins a new ID.  It reads the first observation from A, and sets INA to 1.  Then it reads the first (and only) observation from B, and sets inB to 1.  That first observation passes the subsetting IF test, and gets output.

Next, the programming statements set both IN= variables to 0.  The software reads the second observation from A, and sets INA to 1.  It reads nothing from B, since there is no second observation for the current ID.  So INB remains 0.  That observation does not pass the subsetting IF test, and gets deleted.

Hope this is all clear.  The programming looks simple, but it takes advantage of a complex process.

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 3172 views
  • 6 likes
  • 7 in conversation