Can someone help me understand how to write the below match merge code using proc sql?
DATA WORK.ACT MERGE WORK.ATTAIN2 (in=A) WORK.ACTL2 (in=B); BY FISC_YR FISC_PD FISC_WK PO PLANT MATERIAL2; RUN;
I tried using Full Join and also with Where clause as shown below to rewrite match merge but total record count is not matching. Why so? How to write match merge using sql?
PROC SQL;
CREATE TABLE ACT AS
SELECT A.* ,B.*
FROM WORK.ATTAIN2 A FULL JOIN WORK.ACTL2 B
ON A.MATERIAL2=B.MATERIAL2 AND A.PLANT=B.PLANT AND A.PO=B.PO AND A.FISC_YR=B.FISC_YR AND A.FISC_PD=B.FISC_PD AND A.FISC_WK=B.FISC_WK;
RUN;
PROC SQL;
CREATE TABLE ACT AS
select B.*,A.SNPSHT_DT,A.IP_BRAND,A.IP_BU,A.CATG,A.PLANNED_CS,A.PLANNED_LB,A.PLANNED_ZNL
from ACTL2 as B ,ATTAIN2 as A
where B.FISC_YR=A.FISC_YR AND B.FISC_PD=A.FISC_PD AND B.FISC_WK=A.FISC_WK AND B.PO=A.PO AND
B.PLANT=A.PLANT AND B.MATERIAL2=A.MATERIAL2;
RUN;
Please have a look at this paper https://support.sas.com/resources/papers/proceedings/proceedings/sugi23/Advtutor/P47.pdf.
Some representative test data set will help in testing your code.
When your data sets contain a many-to-many match, there is no practical way to get the same result. That situation would occur when both of your data sets contain more than one observation for the same values of your "joining" variables.
To understand why, you would need to study up on how SQL handles a many-to-many match, and how MERGE handles a many-to-many match. If you can describe either one of those, I'll volunteer to describe how the other tool works.
You already know they are not identical for all circumstances. So what part of the problem do you already understand?
When there is a many-to-many match, what do you want the matched result to be? Do you understand what I mean when I say a "many-to-many match" ?
@David_Billa wrote:
We're working on SAS to R migration. We are unable to match the results
with R and SAS when we assumed match merge as full join while coding in R.
This may not be the right forum to ask this question but still if you have
any idea how to code match merge in R then it would be helpful for us.
Considering that you have asked multiple questions about Merge versus join I have a hard time understanding why you would assume, given the responses, that a SAS Merge is a "full join" and especially not with many-to-many merge.
Hello @David_Billa
While match merge is a wonderful tool. It has it's pitfalls. As pointing out by @Astounding many to many joins is one case. The paper i listed in my earlier post list's some other scenarios where match merge will not give the correct result.
it is better to use PROC SQL to do the join process and compare with R.
You can but does the resulting table meet your requirements?
@David_Billa wrote:
Well, so can we use Proc SQL full join then instead of match merge?
No, if there's a many-to-many merge and the special behavior of the data step in this situation was accounted for by the original coder (if it wasn't accounted for, you may have had incorrect data all the time).
Look at your data (Maxim 3). It may be that the original coder avoided a deduplication step because it would not change the result of the data step, but for SQL the dedup is needed to avoid the cartesian join.
Regarding a SAS to R "migration": R is good at statistics, but sucks at ETL, where SAS excels thanks to the data step language (and other tools). Welcome to the world of Sisyphos. Instead of trying a step-by-step "migration", first describe the logic of the whole priocess, and then implement it in R from scratch.
You need to first determine what the deduplication needs to achieve
This will determine if a simple SELECT DISTINCT is sufficient, or if you need to use summary functions and GROUP BY.
Doesn't R have an equivalent for PROC SORT which you can use? Or are you doing all your ETL now in a SQL database?
"Well, so can we use Proc SQL full join then instead of match merge?"
Yes you can.
@Sajid01 wrote:
"Well, so can we use Proc SQL full join then instead of match merge?"
Yes you can.
... but don't be surprised.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.