DATA Step, Macro, Functions and more

Difference between Merge and Proc SQL left join

Reply
N/A
Posts: 0

Difference between Merge and Proc SQL left join

I noticed the following difference between the Proc SQl left join and SAS merge. I wanted to be sure whether I am correct or not and if there is a way to do the same thing using Merge.

DAtaset1

ID Post date
23456 02Aug2008
23456 03Aug2008
45678 04Aug2008

Dataset2

ID sell date
23456 09Aug2008
23456 10Aug2008



If we do a left join by Proc sql
like this
proc sql;
create table temp as
select t1.*, t2.sell date
form dataset1 t1
left join
dataset2 t2
on t1.ID = t2.ID;
quit;

we will get the following:

ID Post date sell date
23456 02Aug2008 09Aug2008
23456 02Aug2008 10Aug2008
23456 03Aug2008 09Aug2008
23456 03Aug2008 10Aug2008
45678 04Aug2008 .

But if we do a Merge like this
data temp;
Merge dataset1(in=a) dataset2;
if a;
by ID;
run;

we get the following:
ID Post date sell date
23456 02Aug2008 09Aug2008
23456 03Aug2008 10Aug2008
45678 04Aug2008 .

Isn't Merge supposed to be equivalent to left join ???

Is there any way to do the SQL left join by using data set ?
Super User
Posts: 5,424

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
Data step merge has a similar functionality of an outer join, but the result depends on the data. If your relationship between the table keys are 1-M, you will see the same result. In your example, there is a M-M relationship.

This is due to the fact that SQL is acting on columns, and the data step has a more row oriented approach.

SQL tries to combine all id values between the two tables, and then select the matches (that meet the join criteria). The data step merge by will put the two tables beside each other. As long there is a match in the BY group, the data step will go to the next row in both tables until there is no match. So, if you have two rows with id=1 in table a, and three rows with id=1 in table two, you will end up with three rows in your output table. Row three will be data from row two in table a, and row three in table b.

Hope I could make this any sense...

Regards,
Linus
Data never sleeps
N/A
Posts: 0

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
So there is no way to get the desired output (which left join gives) by using data merge ? Message was edited by: goyal
Super User
Posts: 5,424

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
I'm not sure, very difficult in just one step believe. But why? If you know how to do it in SQL, why not stick to that...?

/Linus
Data never sleeps
N/A
Posts: 0

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
data step merge does not correspond to a cartesian-type (m-n) join.

That type of join be achieved in a data step if that is your preference, by using two set statements and controlling the reads from each table, probably with key= indexed access on at least one of these tables.
I don't need that data step complexity because I think the sql join is easier to recognise, understand and therefore, maintain. Why would you seek it?

PeterC
N/A
Posts: 0

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
I am seeking this because I thought it would be possible to do it by a simple Merge statement. It was quite surprising that left join and data merge are not equivalent.
N/A
Posts: 0

Re: Difference between Merge and Proc SQL left join

Posted in reply to deleted_user
MERGE will happily provide 1-N and N-1 joins (but with unexpected effects, in the nature of a data step, when variables are common to both tables but not among the list of by- variables).
It looked to me like you wanted a M-N join, because you have repeats of the by-variable on both data sets.
When by-values repeat on more than one table in a MERGE, the effect is like two diaries side-by-side. Each time a MERGE statement is executed a row of data is taken from each table in the statement until there is no more data for that by-value in any table on the statement. I expect it is better described in the proper documentation.

good luck


PeterC
Ask a Question
Discussion stats
  • 6 replies
  • 399 views
  • 0 likes
  • 2 in conversation