DATA Step, Macro, Functions and more

Merging SAS data sets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Merging SAS data sets

Hi

When you are merging two SAS data sets where both have one or more of the same column (and not used in the 'by' statement: in below example 'age'), which data set's value is retained as the value in the final, merged data set?

Example:

group1:

ID      name     age     startdate

group2:

ID     age     salary

data group3;

merge group1 group2;

by ID;

run;


Accepted Solutions
Solution
‎03-25-2014 11:18 AM
Super User
Posts: 19,878

Re: Merging SAS data sets

CAUTION:
Avoid using duplicate values or different values of common variables. 

One-to-one merging with data sets that contain duplicate values of common variables can produce undesirable results. If a variable exists in more than one data set, the value from the last data set that is read is the one that is written to the new data set. The variables are combined exactly as they are read from each data set. Using a one-to-one merge to combine data sets with different values of common variables can also produce undesirable results. If a variable exists in more than one data set, the value from the last data set read is the one that is written to the new data set even if the value is missing. Once SAS has processed all observations in a data set, all subsequent observations in the new data set have missing values for the variables that are unique to that data set.  [cautionend]

SAS(R) 9.2 Language Reference: Concepts, Second Edition

View solution in original post


All Replies
Solution
‎03-25-2014 11:18 AM
Super User
Posts: 19,878

Re: Merging SAS data sets

CAUTION:
Avoid using duplicate values or different values of common variables. 

One-to-one merging with data sets that contain duplicate values of common variables can produce undesirable results. If a variable exists in more than one data set, the value from the last data set that is read is the one that is written to the new data set. The variables are combined exactly as they are read from each data set. Using a one-to-one merge to combine data sets with different values of common variables can also produce undesirable results. If a variable exists in more than one data set, the value from the last data set read is the one that is written to the new data set even if the value is missing. Once SAS has processed all observations in a data set, all subsequent observations in the new data set have missing values for the variables that are unique to that data set.  [cautionend]

SAS(R) 9.2 Language Reference: Concepts, Second Edition

Regular Contributor
Posts: 216

Re: Merging SAS data sets

Thanks. I was thinking the last.

Paul

Super User
Posts: 5,518

Re: Merging SAS data sets

Applying those rules, the answer is "It depends."  If you have a 1-to-1 match, then the value comes from the second data set.  If you have a 1-to-many match, the value also comes from the second data set.  If you have a many-to-1 match, the answer changes.  The value on the first observation for an ID comes from the second data set, and on any subsequent observations for the same ID comes from the first data set.

One rule, not mentioned above, is that SAS never reads the same observation twice.  It reads each observation once and retains whatever data values have been read.  So the process is more like this.  For the current ID, see if there is another unread observation in GROUP1.  If so, read it in (replacing the common variable).  Then for that same ID, see if there is another unread observation in GROUP2.  If so, read it in (replacing the common variable).  Whatever value was read last is the one that you end up with.

Super User
Posts: 7,868

Re: Merging SAS data sets

My personal rule is: don't do it. Even if it works, it is very hard for someone (and that someone may be YOU) who later reads the code to figure out what is actually happening and what was supposed to happen.

Use the rename= data set option to clear up the ambiguities and write code to assign the values as wanted, then drop unwanted variables from the output data set.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 293 views
  • 0 likes
  • 4 in conversation