Appreciate if someone of you help me understand this match merge. I want to know in the final result MFGCOMBINE3, why the variables MATRL_NBR and PLANNED_ZNL values are missing in second record and why the value of REASDES got updated in first record.
data MFGCOMBINE1; MATRL_NBR=.; FISC_YR=2019; FISC_PD='012/2019'; FISC_WK='049/2019'; PLANT=337; MATERIAL2=3010030074; PLANNED_ZNL=.; REASDES='DO NOT USE Line Performance(LP)'; run; data MFGCOMBINE2; MATRL_NBR=.; FISC_YR=2019; FISC_PD='012/2019'; FISC_WK='049/2019'; PLANT=337; MATERIAL2=3010030074; PLANNED_ZNL=.; REASDES='DO NOT USE Line Performance(LP)'; run; data MFGCOMBINE; set MFGCOMBINE1 MFGCOMBINE2; run; data PRIORACTUALS2019A; MATRL_NBR=3010030074; FISC_YR=2019; FISC_PD='012/2019'; FISC_WK='049/2019'; PLANT=337; MATERIAL2=3010030074; PLANNED_ZNL=68478; REASDES='Line Performance'; run; DATA MFGCOMBINE3; MERGE WORK.MFGCOMBINE WORK.PRIORACTUALS2019A; BY FISC_YR FISC_PD FISC_WK PLANT MATERIAL2; RUN;
Like the SET statement, the MERGE statement is, at its core, based on sequential access of the named datasets, where the sequential processing occurs within matching BY groups of the merged datasets.
So, here's what happens from MERGE mfgcombine prioractuals2019a; BY ....; :
A couple of notes: Your example contains no variables that belong to just one of the data sets. If it did, then values of those variables would be retained, like all others, from step 3 to step 4. But in your case there is no second record from prioractuals2019a. So any variable present only in the single prioractuals2019a record would be unchanged in all subsequent output records.
Also the retain behavior is overwritten when there is a change in any BY variable.
So SAS match merge is like update statement on by fields and it's neither a full join nor inner/left join?
@David_Billa wrote:
So SAS match merge is like update statement on by fields and it's neither a full join nor inner/left join?
Ignore the SQL concepts of joins. Just process the observations one by one following the rules.
Consider each dataset like a face up deck of cards so you can see the value of the next card before you pick it up.
The MERGE is going to combine the decks so that the ones with the same BY variable values match.
The general process is you look at the top cards in each input and find the "smallest" one. If that starts a new group then erase everything. Take the first card for this new group (scanning the decks from left to right) and read the values from the card into the corresponding variables. If there are top cards from the other decks for this group then read them and update the variables. So if the same variables are coming from two decks (dataset) then the value of the last one read "wins". Once you have scanned across all decks copy the current values to a card and stick into the output deck. Now you are ready to start the next card. Are there any more cards visible on the input decks for this by group? If so again read one card for this group from left to write as before and then write out the result. When there are no more visible cards for this group then clear the variables and find the next lowest group and repeat.
So if there is a variable that is only in the second deck and that deck only had one card for this BY group then its value is never changed when the second card for this by group is read from the first deck. That is how a MERGE statement can work add lookup values into a dataset in a many to 1 situation.
reading the documentation may help
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/p1phdzzlrc1wi8n1bpigstwt851o.htm
You have a two-to-one merge, with two variables in common that are not part of the BY.
In the first match, the values from the first dataset are read, then overwritten with the values from the second; for the second match, the values from the second dataset are retained (as no further matching observations are found in the second dataset), and then overwritten with the values from the first dataset.
You will do best by not thinking of a data step MERGE as some kind of join, as it isn't (although it can be used for such an operation, with proper diligence). It's also not an update, the data step provides the UPDATE statement for this.
I specifically point you to this example in @tarheel13 's link:
which explicitly covers the mechanics of a one-to-many merge with common variables.
Edit: fixed typo.
Thank you @Kurt_Bremser . I referred to the documentation because the illustrated example explains what happens better than I can 🙂
What are you trying to do?
When you MERGE two datasets it is normally to add additional VARIABLES from the second dataset to the first dataset.
In your example the two dataset have the exact same variables.
So what is it that you are trying to accomplish by merging them?
One situation where it makes sense to have both datasets have the same set of variables (or the second dataset to have a subset of the variables in the first) is when the intent is to apply transactions to the first dataset. If that is what you are trying to do then look at the UPDATE statement. That will ignore the missing values in the transaction dataset, leaving the values in the original dataset for that by group unchanged.
Like the SET statement, the MERGE statement is, at its core, based on sequential access of the named datasets, where the sequential processing occurs within matching BY groups of the merged datasets.
So, here's what happens from MERGE mfgcombine prioractuals2019a; BY ....; :
A couple of notes: Your example contains no variables that belong to just one of the data sets. If it did, then values of those variables would be retained, like all others, from step 3 to step 4. But in your case there is no second record from prioractuals2019a. So any variable present only in the single prioractuals2019a record would be unchanged in all subsequent output records.
Also the retain behavior is overwritten when there is a change in any BY variable.
Hi,
I would go with this paper: "How MERGE Really Works" by Bob Virgile
https://stats.oarc.ucla.edu/wp-content/uploads/2016/02/ad155.pdf
It's old but not obsolete, and does what it suppose to do, i.e. explains how MERGE really woks 🙂
Bart
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.