BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gamotte
Rhodochrosite | Level 12

Hi all,

Here is an exampe of what i would like to do

data A;

input @1 id @3 x;

cards;

1 0

1 0

1 0

2 0

2 0

3 0

3 0

3 0

;

run;

data B;

input @1 id  @3 x;

cards;

1 1

2 3

;

run;

Dataset B has at most one observation per id whereas dataset A has (possibly multiple) obervations for all ids.

I would like to update colum x from A with column x from B so that A finally contains

1 1

1 1

1 1

2 3

2 3

3 3

3 0

3 0

To be more precise, i need to do this for several columns and not just one as in the example.

So far, I obtained the result using something like :

proc sort data=A; by id; run;

proc sort data=B; by id; run;

data A;

  merge A (in=a) B(in=b rename=(x=x2));
  by id;

  if b then x=x2;

  if a;

  drop x2;

run;

Is there a better way to get the same result ?

By better, I mean :

- more efficient (dataset A will contain between 30 000 and 50 000 records, B is typically much

smaller - a few hundreds, the update concerns a dozen of variables)

- more elegant. In particular, I am not very pleased with the need to rename the variables

although they have the same name in both datasets.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
cwilson
Calcite | Level 5

You could use PROC SQL.

PROC SQL ;

     create table A_updated as

     select A.id, COALESCE(B.x, A.x) as x

     from A

     left outer join B on A.id = B.id ;

quit ;

It has the benefit of not needing to pre-sort the two datasets, so you save some CPU and I/O there.

I have been wanting to run some independent tests to see the CPU and I/O costs for the two different methods, but haven't had the time.

You might report back on your tests.

Note:  if you have multiple variables that need to be updated at the same time, you can do them all at once, using the COALESCE on each variable.

so if table A has variables id, x, y and z, and table B has the same variables:

PROC SQL ;

     create table A_updated as

     select A.id

     , COALESCE(B.x, A.x) as x

     , COALESCE(B.y, A.y) as y

     , COALESCE(B.z, A.z) as z

     from A

     left outer join B on A.id = B.id ;

quit ;

View solution in original post

6 REPLIES 6
peterz
Calcite | Level 5

Try making a format out of your smaller data using proc format and cntlin= option.

Then apply that format to your key variables in the big dataset to create the values for the variable you are after.

cwilson
Calcite | Level 5

You could use PROC SQL.

PROC SQL ;

     create table A_updated as

     select A.id, COALESCE(B.x, A.x) as x

     from A

     left outer join B on A.id = B.id ;

quit ;

It has the benefit of not needing to pre-sort the two datasets, so you save some CPU and I/O there.

I have been wanting to run some independent tests to see the CPU and I/O costs for the two different methods, but haven't had the time.

You might report back on your tests.

Note:  if you have multiple variables that need to be updated at the same time, you can do them all at once, using the COALESCE on each variable.

so if table A has variables id, x, y and z, and table B has the same variables:

PROC SQL ;

     create table A_updated as

     select A.id

     , COALESCE(B.x, A.x) as x

     , COALESCE(B.y, A.y) as y

     , COALESCE(B.z, A.z) as z

     from A

     left outer join B on A.id = B.id ;

quit ;

Ranny
Obsidian | Level 7

All,

* Here is my table A ;

data A;
input @1 QUE1 $ @6 QUE2 $ @11 QUE3 $ @16 QUE4 $ @21 UNIQUE$;
cards;
0001 0011 0003 0011 ABCDERG1
0002 6934 0006 6934 ABCDERG2
0003 0007 6934 0010 ABCDERG3
0004 6935 6934 0014 ABCDERG4
0005 6934 0014 6934 ABCDERG4
0006 0002 0015 6935 ABCDERG5
0007 0015 6934 0015 ABCDERG6
0008 0017 6934 0017 ABCDERG7
0009 0018 0018 0018 ABCDERG8
6934 6934 0017 6935 ABCDERG9
6935 0007 6935 0017 ABCDERG0
;
run;

* Here is my table B ;

data B;
input @1 QUE_ALL $   AMT 5.2;
cards;
0001 7.74
0002 2.90
0003 8.74
0004 6.74
0005 1.90
0006 13.74
0007 0.00
0008 6.00
0009 9.90
0010 9.74
0011 0.90
0012 5.90
0013 9.90
0014 10.24
0015 14.24
0016 17.24
0017 10.74
0018 14.74
0019 12.74
0020 11.74
6932 4.24
6933 4.49
6934 4.74
6935 4.99
6936 5.24
;
run;


* Below is the output table I am looking for
we need to merge QUE1, QUE2 , QUE3, QUE4 from table A to QUE_ALL from table B
and create the new variable AMT1, AMT2, AMT3, AMT4 accordingly;


QUE1 AMT1  QUE2  AMT2  QUE3  AMT3  UNIQUE
0001   7.74    0011   0.90     0003   8.74    ABCDERG1
0002   2.90    6934   4.74     0006   13.74  ABCDERG2
0003   8.74    0007   0.00     6934   4.74    ABCDERG3
0004   6.74    6935   4.99     6934   4.74    ABCDERG4
0005   1.90    6934   4.74     0014   10.24  ABCDERG4
0006   13.74  0002   2.90     0015   14.24  ABCDERG5
0007   0.00    0015   14.24   6934   4.74    ABCDERG6
0008   6.00    0017   10.74   6934   4.74    ABCDERG7
0009   9.90    0018   14.74   0018   14.74  ABCDERG8
6934   4.74    6934   4.74     0017   10.74  ABCDERG9
6935   5.24    0007   0.00     6935   4.99    ABCDERG0

;
* after spending whole day on this - I thought to post here.;

*Thank you for your help;

cwilson
Calcite | Level 5

This is a good example of the usefulness of PROC SQL ;

You can merge to the same dataset multiple times, each time, merging by a different variable.

Notice that I added an alias to each variation on the B table to distinguish the AMT that I want, and I used a LEFT OUTER JOIN to keep the data from A in case it does not match a value in B.

proc sql ;
create table AB as
select A.QUE1, B1.AMT as AMT1,
A.QUE2, B2.AMT as AMT2,
A.QUE3, B3.AMT as AMT3,
A.QUE4, B4.AMT as AMT4,
A.UNIQUE
from A
left outer join B as B1 on A.QUE1 = B1.QUE_ALL
left outer join B as B2 on A.QUE2 = B2.QUE_ALL
left outer join B as B3 on A.QUE3 = B3.QUE_ALL
left outer join B as B4 on A.QUE4 = B4.QUE_ALL

order by A.UNIQUE
;
quit ;

proc print data = AB ;
run ;

Ranny
Obsidian | Level 7

Thank you Cwilson Smiley Happy thats what I was looking for. Perfect ...

gamotte
Rhodochrosite | Level 12

Thanks peterz and cwilson.

I would never have thought to use a format for this purpose, really clever.

I think I will rather use the COALESCE option though as I'm trying to do

something generic (i.e. it is supposed to be part of a macro) and it seems

to me simpler that way.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 8973 views
  • 5 likes
  • 4 in conversation