BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

Appreciate if someone of you help me understand this match merge. I want to know  in the final result MFGCOMBINE3, why the variables MATRL_NBR and PLANNED_ZNL values are missing in second record and why the value of REASDES got updated in first record. 

 

data MFGCOMBINE1;
	MATRL_NBR=.;
	FISC_YR=2019;
	FISC_PD='012/2019';
	FISC_WK='049/2019';
	PLANT=337;
	MATERIAL2=3010030074;
	PLANNED_ZNL=.;
	REASDES='DO NOT USE Line Performance(LP)';
run;

data MFGCOMBINE2;
	MATRL_NBR=.;
	FISC_YR=2019;
	FISC_PD='012/2019';
	FISC_WK='049/2019';
	PLANT=337;
	MATERIAL2=3010030074;
	PLANNED_ZNL=.;
	REASDES='DO NOT USE Line Performance(LP)';
run;

data MFGCOMBINE;
	set MFGCOMBINE1 MFGCOMBINE2;
run;

data PRIORACTUALS2019A;
	MATRL_NBR=3010030074;
	FISC_YR=2019;
	FISC_PD='012/2019';
	FISC_WK='049/2019';
	PLANT=337;
	MATERIAL2=3010030074;
	PLANNED_ZNL=68478;
	REASDES='Line Performance';
run;

DATA MFGCOMBINE3;
	MERGE WORK.MFGCOMBINE
		WORK.PRIORACTUALS2019A;
	BY FISC_YR FISC_PD FISC_WK PLANT MATERIAL2;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Like the SET statement, the MERGE statement is, at its core, based on sequential access of the named datasets, where the sequential processing occurs within matching BY groups of the merged datasets.

 

So, here's what happens from MERGE mfgcombine prioractuals2019a;  BY ....; :

 

  1. SAS reads the first matching record from mfgcombine.
  2. SAS reads the first matching record (i.e. same BY values) from prioractuals2019a.  Any other common variables  (MATRL_NBR, PLANNED_ZNL, and REASDES in your case) will have the values read from mfgcombine overwritten by values from prioractuals2019a.
  3. The first record is output.  This produced the results you were presumably expecting.   All the variables read in from mfgcombine and prioractuals2019a are not reset to missing after output.  They are retained until and unless they are replaced in step 4 or 5 below.

  4. SAS reads the next matching record (if any) from mfgcombine.  If there isn't such a record, the variables named in mfgcombine are not (yet) overwritten, nor set to missing.  But in your case there is a second mfgcombine record, which replaces the retained values for MATRL_NBR, PLANNED_ZNL, and REASDES.  That is, it replaces the values that were retrieved in step1 and then step 2 (from prioractuals2019a)  above.
  5. SAS finds no second record in prioractuals2019a, so no values were modified.
  6. SAS outputs the second record.

 

A couple of notes:  Your example contains no variables that belong to just one of the data sets.  If it did, then values of those variables would be retained, like all others, from step 3 to step 4.  But in your case there is no second record from prioractuals2019a.  So any variable present only in the single prioractuals2019a record would be unchanged in all subsequent output records.

 

Also the retain behavior is overwritten when there is a change in any BY variable.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
David_Billa
Rhodochrosite | Level 12

So SAS match merge is like update statement on by fields and it's neither a full join nor inner/left join?

Tom
Super User Tom
Super User

@David_Billa wrote:

So SAS match merge is like update statement on by fields and it's neither a full join nor inner/left join?


Ignore the SQL concepts of joins.  Just process the observations one by one following the rules.

 

Consider each dataset like a face up deck of cards so you can see the value of the next card before you pick it up.

The MERGE is going to combine the decks so that the ones with the same BY variable values match. 

The general process is you look at the top cards in each input and find the "smallest" one.  If that starts a new group then erase everything.  Take the first card for this new group (scanning the decks from left to right) and read the values from the card into the corresponding variables. If there are top cards from the other decks for this group then read them and update the variables.  So if the same variables are coming from two decks (dataset) then the value of the last one read "wins". Once you have scanned across all decks copy the current values to a card and stick into the output deck.  Now you are ready to start the next card.  Are there any more cards visible on the input decks for this by group?  If so again read one card for this group from left to write as before and then write out the result.  When there are no more visible cards for this group then clear the variables and find the next lowest group and repeat.

 

So if there is a variable that is only in the second deck and that deck only had one card for this BY group then its value is never changed when the second card for this by group is read from the first deck.  That is how a MERGE statement can work add lookup values into a dataset in a many to 1 situation.

 

David_Billa
Rhodochrosite | Level 12
There are multiple examples in the provided link. May I know which example
should I refer for the program which I shown in my post?
Kurt_Bremser
Super User

You have a two-to-one merge, with two variables in common that are not part of the BY.

In the first match, the values from the first dataset are read, then overwritten with the values from the second; for the second match, the values from the second dataset are retained (as no further matching observations are found in the second dataset), and then overwritten with the values from the first dataset.

 

You will do best by not thinking of a data step MERGE as some kind of join, as it isn't (although it can be used for such an operation, with proper diligence). It's also not an update, the data step provides the UPDATE statement for this.

tarheel13
Rhodochrosite | Level 12

Thank you @Kurt_Bremser . I referred to the documentation because the illustrated example explains what happens better than I can 🙂

Tom
Super User Tom
Super User

What are you trying to do?  

When you MERGE two datasets it is normally to add additional VARIABLES from the second dataset to the first dataset.

In your example the two dataset have the exact same variables.

So what is it that you are trying to accomplish by merging them?

 

One situation where it makes sense to have both datasets have the same set of variables (or the second dataset to have a subset of the variables in the first) is when the intent is to apply transactions to the first dataset.   If that is what you are trying to do then look at the UPDATE statement.  That will ignore the missing values in the transaction dataset, leaving the values in the original dataset for that by group unchanged.

mkeintz
PROC Star

Like the SET statement, the MERGE statement is, at its core, based on sequential access of the named datasets, where the sequential processing occurs within matching BY groups of the merged datasets.

 

So, here's what happens from MERGE mfgcombine prioractuals2019a;  BY ....; :

 

  1. SAS reads the first matching record from mfgcombine.
  2. SAS reads the first matching record (i.e. same BY values) from prioractuals2019a.  Any other common variables  (MATRL_NBR, PLANNED_ZNL, and REASDES in your case) will have the values read from mfgcombine overwritten by values from prioractuals2019a.
  3. The first record is output.  This produced the results you were presumably expecting.   All the variables read in from mfgcombine and prioractuals2019a are not reset to missing after output.  They are retained until and unless they are replaced in step 4 or 5 below.

  4. SAS reads the next matching record (if any) from mfgcombine.  If there isn't such a record, the variables named in mfgcombine are not (yet) overwritten, nor set to missing.  But in your case there is a second mfgcombine record, which replaces the retained values for MATRL_NBR, PLANNED_ZNL, and REASDES.  That is, it replaces the values that were retrieved in step1 and then step 2 (from prioractuals2019a)  above.
  5. SAS finds no second record in prioractuals2019a, so no values were modified.
  6. SAS outputs the second record.

 

A couple of notes:  Your example contains no variables that belong to just one of the data sets.  If it did, then values of those variables would be retained, like all others, from step 3 to step 4.  But in your case there is no second record from prioractuals2019a.  So any variable present only in the single prioractuals2019a record would be unchanged in all subsequent output records.

 

Also the retain behavior is overwritten when there is a change in any BY variable.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
yabwon
Onyx | Level 15

Hi,

 

I would go with this paper: "How MERGE Really Works" by Bob Virgile

https://stats.oarc.ucla.edu/wp-content/uploads/2016/02/ad155.pdf

It's old but not obsolete, and does what it suppose to do, i.e. explains how MERGE really woks 🙂

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 981 views
  • 6 likes
  • 6 in conversation