DATA Step, Macro, Functions and more

Merge VS. SQL

Reply
Contributor alr
Contributor
Posts: 39

Merge VS. SQL

Hi,
I thought merge of data by a key and with an “if” limiting the data to one of the merge tables was exactly like a left join in proc sql.
But when both tables have doublets (or repeats of BY values) MERGE and LEFT JOIN will not give the same result.

 

How can I change the SQL statement to get the same result as merge (I’m trying to convert SAS code to SQL).

 

Here is my simplified example:

(In this example merge gives 4 observation and left join gives 6 observations - I need the 4 observation but with sql statment - is this possible?)

 

DATA one;

INPUT x y $1.;

DATALINES;

1 A

1 B

2 C

3 D

;

run;

DATA two;

INPUT x z $2.;

DATALINES;

1 Z1

1 Z2

2 Z

3 Z

;

run;

proc sort data=one; by x; run;

proc sort data=two; by x; run;

 

data mergedata;

merge one(in=i) two;

by x;

if i;

run;

 

proc sql;

create table leftjoindata as

select one.*

            ,two.z

from one left join two

on one.x = two.x

;

quit;

 

Super User
Posts: 19,767

Re: Merge VS. SQL

It's almost always wrong to use a merge when your tables have duplicate key/by variables. 

I would verify the original requirement first to make sure it was the intended results. 

 

What often happens is that a table doesn't have duplicates initially but starts at some point and slips through. 

 

I'm not sure how you'd code the SQL because it ends up passing a value across rows if it was missing or should be missing. 

 

http://www.sascommunity.org/wiki/Many-to-Many_MERGE

Super User
Posts: 5,424

Re: Merge VS. SQL

It's almost impossible to mimic this behaviour. This since the data step processes row by row. SQL on the other hand is column oriented and nothe aware of source data order.
What is your requirement? Get your keys corrected and the n chose the data step it SQL depending on the type processing required for the specific situation.
Data never sleeps
Super User
Posts: 7,757

Re: Merge VS. SQL

If you have multiple occurences of by values in a data step merge, and the program produced the desired result, then the data step was used for a purpose. You can't reproduce its behaviour in SQL, as SQL always produces cartesian joins in such cases.

Bottom line: stay with the data step logic.

Both data steps and SQL have their place.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 3,250

Re: Merge VS. SQL

Posted in reply to KurtBremser

Well said @KurtBremser. I've lost count of the number of posts that start by saying they have a process working perfectly in a DATA or SQL step and want to convert it to the other. My immediate reaction is if it isn't broken why are you trying to fix it?

 

Of course there will sometimes be valid reasons for conversion, but if you are already using the right tool for the job why change it?

Contributor alr
Contributor
Posts: 39

Re: Merge VS. SQL

Thank you.
But unfortunately I don’t have access to data with SAS anymore and have to convert the code to Oracle SQL.
Otherwise I’m totally agreed with you all.

Super User
Posts: 7,757

Re: Merge VS. SQL

Well, in this case it's best to rebuild the functionality from scratch with Oracle SQL tools.

Since you'll have to rewrite everything (that's not already standard SQL) anyway, you better use the opportunity for optimisation. Any code that has lived longer than 3 years has accumulated cruft that can be weeded out, and new tools can open the way to better solutions.

Just make sure to make it clear to your customers that the platform switch will incur beyond-marginal cost.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 6 replies
  • 367 views
  • 8 likes
  • 5 in conversation