BookmarkSubscribeRSS Feed
Sarah2913
Calcite | Level 5

I am in the process of converting data merges into proc sql joins. I have the following merge

DATA test ;
MERGE  data 1
data 2 (KEEP=one two three IN=_2);
BY one two ;
RUN;

 

This is my proc sql code 

proc sql; 

create table test1 as 

select a.*, b.one, b.two,b.three from data1 as a

outer join data2 as b

on a.one=b.one and a.two=b.two;

quit; 

I am getting 11 extra observations in the proc sql than the merge. I am not sure why that is happening. Any help greatly appreciated. 

 

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Hi @Sarah2913   

 

1. Please check for duplicates in both tables

2. If yes, proc sql will do many to many join

3. Know your data

 

 

Kurt_Bremser
Super User

Look at the log of your data step. I bet you a beer that you'll have a NOTE about more than one dataset with repeats of by values.

 

In many-to-many joins, SQL behaves different than a data step merge. And in most cases, doing a many-to-many join in a data step merge is not correct, as it tends to lose data.

Sarah2913
Calcite | Level 5

I do have duplicate data. I understand that merge was not the correct way to do this but is there anyway I can get the merge and the join to match up. I have removed duplicates before doing the join and that did not work. I can not remove the duplicates before doing the merge as I have to replicate the merge so can not change this code. Is there any way around this. 

Kurt_Bremser
Super User

You can't duplicate the behaviour of a data step merge in SQL. SQL does not have the tools for it, it will always build a cartesian join.

Either you re-design your data model, or you stay with the data step.

 

You could work around if both your datasets have the same number of repeats in both datasets for any given by-group. But then you would have to add an additional counter for each group to subsequently use in the join, and that is once again achieved in a data step.

Astounding
PROC Star
It's time to examine the results. If MERGE produces the wrong result but nobody noticed, would you want to replicate that with SQL?

Take just one example where both incoming data sets contain multiple observations for the same combination of the variables ONE and TWO. Decide what the correct result would be. Compare that to the results from both SQL and MERGE.

In my view, it's more important to get the right answer. Replicating the wrong answer should be a secondary consideration.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1432 views
  • 0 likes
  • 4 in conversation