08-23-2016 02:21 AM
I thought merge of data by a key and with an “if” limiting the data to one of the merge tables was exactly like a left join in proc sql.
But when both tables have doublets (or repeats of BY values) MERGE and LEFT JOIN will not give the same result.
How can I change the SQL statement to get the same result as merge (I’m trying to convert SAS code to SQL).
Here is my simplified example:
(In this example merge gives 4 observation and left join gives 6 observations - I need the 4 observation but with sql statment - is this possible?)
INPUT x y $1.;
INPUT x z $2.;
proc sort data=one; by x; run;
proc sort data=two; by x; run;
merge one(in=i) two;
create table leftjoindata as
from one left join two
on one.x = two.x
08-23-2016 02:36 AM
It's almost always wrong to use a merge when your tables have duplicate key/by variables.
I would verify the original requirement first to make sure it was the intended results.
What often happens is that a table doesn't have duplicates initially but starts at some point and slips through.
I'm not sure how you'd code the SQL because it ends up passing a value across rows if it was missing or should be missing.
08-23-2016 02:45 AM
08-23-2016 02:50 AM
If you have multiple occurences of by values in a data step merge, and the program produced the desired result, then the data step was used for a purpose. You can't reproduce its behaviour in SQL, as SQL always produces cartesian joins in such cases.
Bottom line: stay with the data step logic.
Both data steps and SQL have their place.
08-23-2016 04:08 AM
Well said @KurtBremser. I've lost count of the number of posts that start by saying they have a process working perfectly in a DATA or SQL step and want to convert it to the other. My immediate reaction is if it isn't broken why are you trying to fix it?
Of course there will sometimes be valid reasons for conversion, but if you are already using the right tool for the job why change it?
08-24-2016 03:38 AM
Well, in this case it's best to rebuild the functionality from scratch with Oracle SQL tools.
Since you'll have to rewrite everything (that's not already standard SQL) anyway, you better use the opportunity for optimisation. Any code that has lived longer than 3 years has accumulated cruft that can be weeded out, and new tools can open the way to better solutions.
Just make sure to make it clear to your customers that the platform switch will incur beyond-marginal cost.