Merging data step vs SQL

Sarah2913 · Posted 02-26-2019 11:33 AM

I have the following SAS code

data merged_data;

merge data1 (in=_1) data2 (in_2) ;

by ID ;

if _1 and _2;

run;

I believe this is similiar to an inner join in SQL but not always the same. Could anyone please explain the difference?

Thank in advance.

Sarah

novinosrin · Posted 02-26-2019 11:39 AM


data a;
input a;
cards;
1
1
2
2
3
3
;

data b;
input a;
cards;
1
1
2
2
3
3
;

data want;
merge a(in=in1) b(in=in2);
by a;
if in1 and in2;
run;

/*please notice n*m many to many*/
proc sql;
create table w as
select a.a,b.a as b
from a a,b b
where a.a=b.a;
quit;

novinosrin · Posted 02-26-2019 11:42 AM

So, as long as it is one to many both would mostly likely yield the same results. However many to many makes sql yield different results for the reason it does a cartesian between by groups n*m as opposed to merge that joins by position

Reeza · Posted 02-26-2019 11:43 AM

SAS data step doesn't do a many to many match, whereas SQL does. Otherwise most joins can be replicated with a data step merge.

I generally join using only SQL, more control and easier to understand in my opinion.

@Sarah2913 wrote:

I have the following SAS code

data merged_data;

merge data1 (in=_1) data2 (in_2) ;

by ID ;

if _1 and _2;

run;

I believe this is similiar to an inner join in SQL but not always the same. Could anyone please explain the difference?

Thank in advance.

Sarah

Sarah2913 · Posted 02-27-2019 04:32 AM

Thanks fpr the reply. Is there anyway to repicate the merge exactly using sql?

Reeza · Posted 02-27-2019 11:03 AM

I don't know but in my experience, no one ever wants the data to do what the SAS data step is doing.

If you're doing straight code conversion for someone, flag this as a possible issue with the model/data.

If you're doing this as homework, my question would be does this actually occur in the data? Is this an issue you're worried about?
If you don't have a many to many, but a left or inner join, those can usually be replicated in some form.

proc sql;
create table want as
select a.*, b.*
from have1 as t1
inner join have2 as t2
on t1.ID=t2.ID;
quit;

@Sarah2913 wrote:
Thanks fpr the reply. Is there anyway to repicate the merge exactly using sql?

novinosrin · Posted 02-27-2019 02:00 PM

HI @Sarah2913 You can replicate if you have rownum for each by group akin to DB sql like oracle or teradata i.e partition by



data a;
input a;
cards;
1
1
2
2
2
2
2
3
3
;
/*Partition by rownum for each by group*/
data a;
set a;
by a;
if first.a then n=1;
else n+1;
run;


data b;
input a;
cards;
1
1
1
2
3
3
3
;
/*Partition by rownum for each by group*/
data b;
set b;
by a;
if first.a then n1=1;
else n1+1;
run;


data want_merge(drop=n:);
merge a(in=in1) b(in=in2);
by a;
if in1 and in2;
run;

/*replicate merge using many to many to make it one to many*/
proc sql;
create table want_sql(drop=n2) as
select distinct a.a,b.a as b,(max(n,n1)) as n2
from a a inner join b b
on  a.a=b.a;
quit;

Of course the above does involve some logic , nevertheless that's a fun experiment

Merging data step vs SQL

Re: Merging data step vs SQL

Re: Merging data step vs SQL

Re: Merging data step vs SQL

Re: Merging data step vs SQL

Re: Merging data step vs SQL

Re: Merging data step vs SQL