BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

When you are merging two SAS data sets where both have one or more of the same column (and not used in the 'by' statement: in below example 'age'), which data set's value is retained as the value in the final, merged data set?

Example:

group1:

ID      name     age     startdate

group2:

ID     age     salary

data group3;

merge group1 group2;

by ID;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
CAUTION:
Avoid using duplicate values or different values of common variables. 

One-to-one merging with data sets that contain duplicate values of common variables can produce undesirable results. If a variable exists in more than one data set, the value from the last data set that is read is the one that is written to the new data set. The variables are combined exactly as they are read from each data set. Using a one-to-one merge to combine data sets with different values of common variables can also produce undesirable results. If a variable exists in more than one data set, the value from the last data set read is the one that is written to the new data set even if the value is missing. Once SAS has processed all observations in a data set, all subsequent observations in the new data set have missing values for the variables that are unique to that data set.  [cautionend]

SAS(R) 9.2 Language Reference: Concepts, Second Edition

View solution in original post

4 REPLIES 4
Reeza
Super User
CAUTION:
Avoid using duplicate values or different values of common variables. 

One-to-one merging with data sets that contain duplicate values of common variables can produce undesirable results. If a variable exists in more than one data set, the value from the last data set that is read is the one that is written to the new data set. The variables are combined exactly as they are read from each data set. Using a one-to-one merge to combine data sets with different values of common variables can also produce undesirable results. If a variable exists in more than one data set, the value from the last data set read is the one that is written to the new data set even if the value is missing. Once SAS has processed all observations in a data set, all subsequent observations in the new data set have missing values for the variables that are unique to that data set.  [cautionend]

SAS(R) 9.2 Language Reference: Concepts, Second Edition

Paul_NYS
Obsidian | Level 7

Thanks. I was thinking the last.

Paul

Astounding
PROC Star

Applying those rules, the answer is "It depends."  If you have a 1-to-1 match, then the value comes from the second data set.  If you have a 1-to-many match, the value also comes from the second data set.  If you have a many-to-1 match, the answer changes.  The value on the first observation for an ID comes from the second data set, and on any subsequent observations for the same ID comes from the first data set.

One rule, not mentioned above, is that SAS never reads the same observation twice.  It reads each observation once and retains whatever data values have been read.  So the process is more like this.  For the current ID, see if there is another unread observation in GROUP1.  If so, read it in (replacing the common variable).  Then for that same ID, see if there is another unread observation in GROUP2.  If so, read it in (replacing the common variable).  Whatever value was read last is the one that you end up with.

Kurt_Bremser
Super User

My personal rule is: don't do it. Even if it works, it is very hard for someone (and that someone may be YOU) who later reads the code to figure out what is actually happening and what was supposed to happen.

Use the rename= data set option to clear up the ambiguities and write code to assign the values as wanted, then drop unwanted variables from the output data set.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 957 views
  • 0 likes
  • 4 in conversation