BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20
I always thought that when a variable is present in both tables, the latest (by order of appearance in the merge statement) table's value overwrites the other values in a merge statement. Not so it seems:



[pre]
proc sort data=SASHELP.CLASS out=TEST; * Build TEST table ;
by AGE; * WEIGHT is only populated for ;
run; * the last record of each AGE group ;
data TEST;
set TEST;
by AGE;
if not last.AGE then WEIGHT=.;
run;




data RESULT; * Merge one: one-to-many merge fails ;
merge TEST
TEST(keep=AGE WEIGHT where=(WEIGHT ne .));
by AGE;
run; * Second table values written to first group record only;




data RESULT; * Merge two: one-to-many merge succeeds ;
merge TEST(drop=WEIGHT)
TEST(keep=AGE WEIGHT where=(WEIGHT ne .));
by AGE;
run; * Second table values written to all records;
[/pre]

Can anyone explain this?
5 REPLIES 5
CurtisMack
Fluorite | Level 6
This is because the value that is kept is the last one read into the data vector. That is only partially reflected by the order of the merged datasets. Merge is not a cartesian join. If you want that use PROC SQL.

Take age 12 in your data for example,

First the James record is read from the first dataset with a missing wieght.
The second dataset has a matching value of Age, so it is read in with a wieght of 128.
The row is written.
Next the Jane record is read from the first table with its missing value for wieght, this overwrites the value of 128.
There is not another record for Age 12 in the second table, so nothing is read.
The row is written.
Next the John record is read from the first table with its missing value for wieght.
There is not another record for Age 12 in the second table, so nothing is read.
The row is written.
And so on.

Your solution of adding the drop statement prevents the first table from overwritting the value wieght, so the value from the second table is retain for all of the rows.

Clear as Mud?

Message was edited by: Curtis Mack Message was edited by: Curtis Mack
ChrisNZ
Tourmaline | Level 20
Thanks Curtis.

Rows are read only once (if applicable) and retained (and potentially over-written), not read over and over as SQL does.

It makes perfect sense now.

A cartesian product is probably more useful (as well as more resource-intensive) than this behaviour, but as long as we are aware of what takes place ...
Andre
Obsidian | Level 7
Chris,

Sas institute produce an interresting and short text about this question

see rule 1 on page 1 in

The fundamentals of merge

support.sas.com/techsup/technote/ts705.pdf

HTH
Andre
er_awasthi
Calcite | Level 5
Great document Andre .

Thanx for Sharing It 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 791 views
  • 0 likes
  • 4 in conversation