BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

Can someone help me understand how to write the below match merge code using proc sql?

 

DATA WORK.ACT 
      MERGE WORK.ATTAIN2 (in=A) WORK.ACTL2 (in=B);
      BY FISC_YR FISC_PD FISC_WK PO PLANT MATERIAL2;    
RUN;

 

I tried using Full Join and also with Where clause as shown below to rewrite  match merge but total record count is not matching. Why so? How to write match merge using sql?

 

PROC SQL; 
 CREATE TABLE ACT AS
  SELECT A.* ,B.*
  FROM WORK.ATTAIN2 A FULL JOIN WORK.ACTL2 B
  ON A.MATERIAL2=B.MATERIAL2 AND A.PLANT=B.PLANT AND A.PO=B.PO AND A.FISC_YR=B.FISC_YR AND A.FISC_PD=B.FISC_PD AND  A.FISC_WK=B.FISC_WK;
RUN;

PROC SQL;
CREATE TABLE ACT AS
select B.*,A.SNPSHT_DT,A.IP_BRAND,A.IP_BU,A.CATG,A.PLANNED_CS,A.PLANNED_LB,A.PLANNED_ZNL
         from ACTL2 as B ,ATTAIN2 as A
         where B.FISC_YR=A.FISC_YR AND B.FISC_PD=A.FISC_PD AND B.FISC_WK=A.FISC_WK AND B.PO=A.PO AND
         B.PLANT=A.PLANT AND B.MATERIAL2=A.MATERIAL2;
RUN;
13 REPLIES 13
Sajid01
Meteorite | Level 14

Please have a look at this paper https://support.sas.com/resources/papers/proceedings/proceedings/sugi23/Advtutor/P47.pdf.
Some representative test data set will help in testing your code.

Astounding
PROC Star

When your data sets contain a many-to-many match, there is no practical way to get the same result.  That situation would occur when both of your data sets contain more than one observation for the same values of your "joining" variables.

 

To understand why, you would need to study up on how SQL handles a many-to-many match, and how MERGE handles a many-to-many match.  If you can describe either one of those, I'll volunteer to describe how the other tool works.

David_Billa
Rhodochrosite | Level 12
We're working on SAS to R migration. We are unable to match the results
with R and SAS when we assumed match merge as full join while coding in R.

This may not be the right forum to ask this question but still if you have
any idea how to code match merge in R then it would be helpful for us.
Astounding
PROC Star

You already know they are not identical for all circumstances.  So what part of the problem do you already understand? 

 

When there is a many-to-many match, what do you want the matched result to be?  Do you understand what I mean when I say a "many-to-many match" ?

ballardw
Super User

@David_Billa wrote:
We're working on SAS to R migration. We are unable to match the results
with R and SAS when we assumed match merge as full join while coding in R.

This may not be the right forum to ask this question but still if you have
any idea how to code match merge in R then it would be helpful for us.

Considering that you have asked multiple questions about Merge versus join I have a hard time understanding why you would assume, given the responses, that a SAS Merge is a "full join" and especially not with many-to-many merge.

 

 

Sajid01
Meteorite | Level 14

Hello @David_Billa 
While match merge is a wonderful tool. It has it's pitfalls. As pointing out by @Astounding many to many joins is one case. The paper i listed in my earlier post list's some other scenarios where match merge will not give the correct result.
it is better to use PROC SQL to do the join process and compare with R.

David_Billa
Rhodochrosite | Level 12
Well, so can we use Proc SQL full join then instead of match merge?
SASKiwi
PROC Star

You can but does the resulting table meet your requirements?

Kurt_Bremser
Super User

@David_Billa wrote:
Well, so can we use Proc SQL full join then instead of match merge?

No, if there's a many-to-many merge and the special behavior of the data step in this situation was accounted for by the original coder (if it wasn't accounted for, you may have had incorrect data all the time).

Look at your data (Maxim 3). It may be that the original coder avoided a deduplication step because it would not change the result of the data step, but for SQL the dedup is needed to avoid the cartesian join.

 

Regarding a SAS to R "migration": R is good at statistics, but sucks at ETL, where SAS excels thanks to the data step language (and other tools). Welcome to the world of Sisyphos. Instead of trying a step-by-step "migration", first describe the logic of the whole priocess, and then implement it in R from scratch.

David_Billa
Rhodochrosite | Level 12
Thank you for the great explanation. May I know how to do the dedup step in
SQL?
Kurt_Bremser
Super User

You need to first determine what the deduplication needs to achieve

  • which values will be used in further processing
  • are the observations in the dataset which needs deduplicating already identical and act like one in the data step
  • or do you need to make a summarization first

This will determine if a simple SELECT DISTINCT is sufficient, or if you need to use summary functions and GROUP BY.

 

Doesn't R have an equivalent for PROC SORT which you can use? Or are you doing all your ETL now in a SQL database?

Sajid01
Meteorite | Level 14

"Well, so can we use Proc SQL full join then instead of match merge?"

Yes you can.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2480 views
  • 0 likes
  • 6 in conversation