BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20
I always thought that when a variable is present in both tables, the latest (by order of appearance in the merge statement) table's value overwrites the other values in a merge statement. Not so it seems:



[pre]
proc sort data=SASHELP.CLASS out=TEST; * Build TEST table ;
by AGE; * WEIGHT is only populated for ;
run; * the last record of each AGE group ;
data TEST;
set TEST;
by AGE;
if not last.AGE then WEIGHT=.;
run;




data RESULT; * Merge one: one-to-many merge fails ;
merge TEST
TEST(keep=AGE WEIGHT where=(WEIGHT ne .));
by AGE;
run; * Second table values written to first group record only;




data RESULT; * Merge two: one-to-many merge succeeds ;
merge TEST(drop=WEIGHT)
TEST(keep=AGE WEIGHT where=(WEIGHT ne .));
by AGE;
run; * Second table values written to all records;
[/pre]

Can anyone explain this?
5 REPLIES 5
CurtisMack
Fluorite | Level 6
This is because the value that is kept is the last one read into the data vector. That is only partially reflected by the order of the merged datasets. Merge is not a cartesian join. If you want that use PROC SQL.

Take age 12 in your data for example,

First the James record is read from the first dataset with a missing wieght.
The second dataset has a matching value of Age, so it is read in with a wieght of 128.
The row is written.
Next the Jane record is read from the first table with its missing value for wieght, this overwrites the value of 128.
There is not another record for Age 12 in the second table, so nothing is read.
The row is written.
Next the John record is read from the first table with its missing value for wieght.
There is not another record for Age 12 in the second table, so nothing is read.
The row is written.
And so on.

Your solution of adding the drop statement prevents the first table from overwritting the value wieght, so the value from the second table is retain for all of the rows.

Clear as Mud?

Message was edited by: Curtis Mack Message was edited by: Curtis Mack
ChrisNZ
Tourmaline | Level 20
Thanks Curtis.

Rows are read only once (if applicable) and retained (and potentially over-written), not read over and over as SQL does.

It makes perfect sense now.

A cartesian product is probably more useful (as well as more resource-intensive) than this behaviour, but as long as we are aware of what takes place ...
Andre
Obsidian | Level 7
Chris,

Sas institute produce an interresting and short text about this question

see rule 1 on page 1 in

The fundamentals of merge

support.sas.com/techsup/technote/ts705.pdf

HTH
Andre
er_awasthi
Calcite | Level 5
Great document Andre .

Thanx for Sharing It 🙂

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1506 views
  • 0 likes
  • 4 in conversation