topic Re: Merging data step vs SQL in New SAS User

Merging data step vs SQL

Sarah2913 — Tue, 26 Feb 2019 16:33:02 GMT

I have the following SAS code

data merged_data;

merge data1 (in=_1) data2 (in_2) ;

by ID ;

if _1 and _2;

run;

I believe this is similiar to an inner join in SQL but not always the same. Could anyone please explain the difference?

Thank in advance.

Sarah

Re: Merging data step vs SQL

novinosrin — Tue, 26 Feb 2019 16:39:00 GMT


data a;
input a;
cards;
1
1
2
2
3
3
;

data b;
input a;
cards;
1
1
2
2
3
3
;

data want;
merge a(in=in1) b(in=in2);
by a;
if in1 and in2;
run;

/*please notice n*m many to many*/
proc sql;
create table w as
select a.a,b.a as b
from a a,b b
where a.a=b.a;
quit;

Re: Merging data step vs SQL

novinosrin — Tue, 26 Feb 2019 16:42:08 GMT

So, as long as it is one to many both would mostly likely yield the same results. However many to many makes sql yield different results for the reason it does a cartesian between by groups n*m as opposed to merge that joins by position

Re: Merging data step vs SQL

Reeza — Tue, 26 Feb 2019 16:43:18 GMT

SAS data step doesn't do a many to many match, whereas SQL does. Otherwise most joins can be replicated with a data step merge.

I generally join using only SQL, more control and easier to understand in my opinion.

@Sarah2913 wrote:

I have the following SAS code

data merged_data;

merge data1 (in=_1) data2 (in_2) ;

by ID ;

if _1 and _2;

run;

I believe this is similiar to an inner join in SQL but not always the same. Could anyone please explain the difference?

Thank in advance.

Sarah

Re: Merging data step vs SQL

Sarah2913 — Wed, 27 Feb 2019 09:32:07 GMT

Thanks fpr the reply. Is there anyway to repicate the merge exactly using sql?

Re: Merging data step vs SQL

Reeza — Wed, 27 Feb 2019 16:03:49 GMT

I don't know but in my experience, no one ever wants the data to do what the SAS data step is doing.

If you're doing straight code conversion for someone, flag this as a possible issue with the model/data.

If you're doing this as homework, my question would be does this actually occur in the data? Is this an issue you're worried about?
If you don't have a many to many, but a left or inner join, those can usually be replicated in some form.

proc sql;
create table want as
select a.*, b.*
from have1 as t1
inner join have2 as t2
on t1.ID=t2.ID;
quit;

@Sarah2913 wrote:
Thanks fpr the reply. Is there anyway to repicate the merge exactly using sql?

Re: Merging data step vs SQL

novinosrin — Wed, 27 Feb 2019 19:00:31 GMT

HI @Sarah2913 You can replicate if you have rownum for each by group akin to DB sql like oracle or teradata i.e partition by



data a;
input a;
cards;
1
1
2
2
2
2
2
3
3
;
/*Partition by rownum for each by group*/
data a;
set a;
by a;
if first.a then n=1;
else n+1;
run;


data b;
input a;
cards;
1
1
1
2
3
3
3
;
/*Partition by rownum for each by group*/
data b;
set b;
by a;
if first.a then n1=1;
else n1+1;
run;


data want_merge(drop=n:);
merge a(in=in1) b(in=in2);
by a;
if in1 and in2;
run;

/*replicate merge using many to many to make it one to many*/
proc sql;
create table want_sql(drop=n2) as
select distinct a.a,b.a as b,(max(n,n1)) as n2
from a a inner join b b
on  a.a=b.a;
quit;

Of course the above does involve some logic , nevertheless that's a fun experiment