DATA Step, Macro, Functions and more

Many to one merge with overlapping variables

Accepted Solution Solved
Reply
Regular Contributor
Posts: 194
Accepted Solution

Many to one merge with overlapping variables

Hi all,

Here is an exampe of what i would like to do

data A;

input @1 id @3 x;

cards;

1 0

1 0

1 0

2 0

2 0

3 0

3 0

3 0

;

run;

data B;

input @1 id  @3 x;

cards;

1 1

2 3

;

run;

Dataset B has at most one observation per id whereas dataset A has (possibly multiple) obervations for all ids.

I would like to update colum x from A with column x from B so that A finally contains

1 1

1 1

1 1

2 3

2 3

3 3

3 0

3 0

To be more precise, i need to do this for several columns and not just one as in the example.

So far, I obtained the result using something like :

proc sort data=A; by id; run;

proc sort data=B; by id; run;

data A;

  merge A (in=a) B(in=b rename=(x=x2));
  by id;

  if b then x=x2;

  if a;

  drop x2;

run;

Is there a better way to get the same result ?

By better, I mean :

- more efficient (dataset A will contain between 30 000 and 50 000 records, B is typically much

smaller - a few hundreds, the update concerns a dozen of variables)

- more elegant. In particular, I am not very pleased with the need to rename the variables

although they have the same name in both datasets.

Thanks


Accepted Solutions
Solution
‎02-13-2014 09:27 AM
Contributor
Posts: 43

Re: Many to one merge with overlapping variables

You could use PROC SQL.

PROC SQL ;

     create table A_updated as

     select A.id, COALESCE(B.x, A.x) as x

     from A

     left outer join B on A.id = B.id ;

quit ;

It has the benefit of not needing to pre-sort the two datasets, so you save some CPU and I/O there.

I have been wanting to run some independent tests to see the CPU and I/O costs for the two different methods, but haven't had the time.

You might report back on your tests.

Note:  if you have multiple variables that need to be updated at the same time, you can do them all at once, using the COALESCE on each variable.

so if table A has variables id, x, y and z, and table B has the same variables:

PROC SQL ;

     create table A_updated as

     select A.id

     , COALESCE(B.x, A.x) as x

     , COALESCE(B.y, A.y) as y

     , COALESCE(B.z, A.z) as z

     from A

     left outer join B on A.id = B.id ;

quit ;

View solution in original post


All Replies
New Contributor
Posts: 4

Re: Many to one merge with overlapping variables

Try making a format out of your smaller data using proc format and cntlin= option.

Then apply that format to your key variables in the big dataset to create the values for the variable you are after.

Solution
‎02-13-2014 09:27 AM
Contributor
Posts: 43

Re: Many to one merge with overlapping variables

You could use PROC SQL.

PROC SQL ;

     create table A_updated as

     select A.id, COALESCE(B.x, A.x) as x

     from A

     left outer join B on A.id = B.id ;

quit ;

It has the benefit of not needing to pre-sort the two datasets, so you save some CPU and I/O there.

I have been wanting to run some independent tests to see the CPU and I/O costs for the two different methods, but haven't had the time.

You might report back on your tests.

Note:  if you have multiple variables that need to be updated at the same time, you can do them all at once, using the COALESCE on each variable.

so if table A has variables id, x, y and z, and table B has the same variables:

PROC SQL ;

     create table A_updated as

     select A.id

     , COALESCE(B.x, A.x) as x

     , COALESCE(B.y, A.y) as y

     , COALESCE(B.z, A.z) as z

     from A

     left outer join B on A.id = B.id ;

quit ;

Occasional Contributor
Posts: 19

Re: Many to one merge with overlapping variables

All,

* Here is my table A ;

data A;
input @1 QUE1 $ @6 QUE2 $ @11 QUE3 $ @16 QUE4 $ @21 UNIQUE$;
cards;
0001 0011 0003 0011 ABCDERG1
0002 6934 0006 6934 ABCDERG2
0003 0007 6934 0010 ABCDERG3
0004 6935 6934 0014 ABCDERG4
0005 6934 0014 6934 ABCDERG4
0006 0002 0015 6935 ABCDERG5
0007 0015 6934 0015 ABCDERG6
0008 0017 6934 0017 ABCDERG7
0009 0018 0018 0018 ABCDERG8
6934 6934 0017 6935 ABCDERG9
6935 0007 6935 0017 ABCDERG0
;
run;

* Here is my table B ;

data B;
input @1 QUE_ALL $   AMT 5.2;
cards;
0001 7.74
0002 2.90
0003 8.74
0004 6.74
0005 1.90
0006 13.74
0007 0.00
0008 6.00
0009 9.90
0010 9.74
0011 0.90
0012 5.90
0013 9.90
0014 10.24
0015 14.24
0016 17.24
0017 10.74
0018 14.74
0019 12.74
0020 11.74
6932 4.24
6933 4.49
6934 4.74
6935 4.99
6936 5.24
;
run;


* Below is the output table I am looking for
we need to merge QUE1, QUE2 , QUE3, QUE4 from table A to QUE_ALL from table B
and create the new variable AMT1, AMT2, AMT3, AMT4 accordingly;


QUE1 AMT1  QUE2  AMT2  QUE3  AMT3  UNIQUE
0001   7.74    0011   0.90     0003   8.74    ABCDERG1
0002   2.90    6934   4.74     0006   13.74  ABCDERG2
0003   8.74    0007   0.00     6934   4.74    ABCDERG3
0004   6.74    6935   4.99     6934   4.74    ABCDERG4
0005   1.90    6934   4.74     0014   10.24  ABCDERG4
0006   13.74  0002   2.90     0015   14.24  ABCDERG5
0007   0.00    0015   14.24   6934   4.74    ABCDERG6
0008   6.00    0017   10.74   6934   4.74    ABCDERG7
0009   9.90    0018   14.74   0018   14.74  ABCDERG8
6934   4.74    6934   4.74     0017   10.74  ABCDERG9
6935   5.24    0007   0.00     6935   4.99    ABCDERG0

;
* after spending whole day on this - I thought to post here.;

*Thank you for your help;

Contributor
Posts: 43

Re: Many to one merge with overlapping variables

This is a good example of the usefulness of PROC SQL ;

You can merge to the same dataset multiple times, each time, merging by a different variable.

Notice that I added an alias to each variation on the B table to distinguish the AMT that I want, and I used a LEFT OUTER JOIN to keep the data from A in case it does not match a value in B.

proc sql ;
create table AB as
select A.QUE1, B1.AMT as AMT1,
A.QUE2, B2.AMT as AMT2,
A.QUE3, B3.AMT as AMT3,
A.QUE4, B4.AMT as AMT4,
A.UNIQUE
from A
left outer join B as B1 on A.QUE1 = B1.QUE_ALL
left outer join B as B2 on A.QUE2 = B2.QUE_ALL
left outer join B as B3 on A.QUE3 = B3.QUE_ALL
left outer join B as B4 on A.QUE4 = B4.QUE_ALL

order by A.UNIQUE
;
quit ;

proc print data = AB ;
run ;

Occasional Contributor
Posts: 19

Re: Many to one merge with overlapping variables

Thank you Cwilson Smiley Happy thats what I was looking for. Perfect ...

Regular Contributor
Posts: 194

Re: Many to one merge with overlapping variables

Thanks peterz and cwilson.

I would never have thought to use a format for this purpose, really clever.

I think I will rather use the COALESCE option though as I'm trying to do

something generic (i.e. it is supposed to be part of a macro) and it seems

to me simpler that way.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1595 views
  • 5 likes
  • 4 in conversation