BookmarkSubscribeRSS Feed
alr
Quartz | Level 8 alr
Quartz | Level 8

Hi,
I thought merge of data by a key and with an “if” limiting the data to one of the merge tables was exactly like a left join in proc sql.
But when both tables have doublets (or repeats of BY values) MERGE and LEFT JOIN will not give the same result.

 

How can I change the SQL statement to get the same result as merge (I’m trying to convert SAS code to SQL).

 

Here is my simplified example:

(In this example merge gives 4 observation and left join gives 6 observations - I need the 4 observation but with sql statment - is this possible?)

 

DATA one;

INPUT x y $1.;

DATALINES;

1 A

1 B

2 C

3 D

;

run;

DATA two;

INPUT x z $2.;

DATALINES;

1 Z1

1 Z2

2 Z

3 Z

;

run;

proc sort data=one; by x; run;

proc sort data=two; by x; run;

 

data mergedata;

merge one(in=i) two;

by x;

if i;

run;

 

proc sql;

create table leftjoindata as

select one.*

            ,two.z

from one left join two

on one.x = two.x

;

quit;

 

6 REPLIES 6
Reeza
Super User

It's almost always wrong to use a merge when your tables have duplicate key/by variables. 

I would verify the original requirement first to make sure it was the intended results. 

 

What often happens is that a table doesn't have duplicates initially but starts at some point and slips through. 

 

I'm not sure how you'd code the SQL because it ends up passing a value across rows if it was missing or should be missing. 

 

http://www.sascommunity.org/wiki/Many-to-Many_MERGE

LinusH
Tourmaline | Level 20
It's almost impossible to mimic this behaviour. This since the data step processes row by row. SQL on the other hand is column oriented and nothe aware of source data order.
What is your requirement? Get your keys corrected and the n chose the data step it SQL depending on the type processing required for the specific situation.
Data never sleeps
Kurt_Bremser
Super User

If you have multiple occurences of by values in a data step merge, and the program produced the desired result, then the data step was used for a purpose. You can't reproduce its behaviour in SQL, as SQL always produces cartesian joins in such cases.

Bottom line: stay with the data step logic.

Both data steps and SQL have their place.

SASKiwi
PROC Star

Well said @Kurt_Bremser. I've lost count of the number of posts that start by saying they have a process working perfectly in a DATA or SQL step and want to convert it to the other. My immediate reaction is if it isn't broken why are you trying to fix it?

 

Of course there will sometimes be valid reasons for conversion, but if you are already using the right tool for the job why change it?

alr
Quartz | Level 8 alr
Quartz | Level 8

Thank you.
But unfortunately I don’t have access to data with SAS anymore and have to convert the code to Oracle SQL.
Otherwise I’m totally agreed with you all.

Kurt_Bremser
Super User

Well, in this case it's best to rebuild the functionality from scratch with Oracle SQL tools.

Since you'll have to rewrite everything (that's not already standard SQL) anyway, you better use the opportunity for optimisation. Any code that has lived longer than 3 years has accumulated cruft that can be weeded out, and new tools can open the way to better solutions.

Just make sure to make it clear to your customers that the platform switch will incur beyond-marginal cost.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1380 views
  • 8 likes
  • 5 in conversation