Re: A left join in a proc sql instead of merge?

GN0001 · Posted 08-03-2021 06:49 PM

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue

data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

Blue Blue

mkeintz · Posted 08-03-2021 10:42 PM

@GN0001 wrote:

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue
data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

If ~~(and only if)~~, for each unique HKEY value either have a one:one, many:one or a one:many match (i.e. no many:many matches), then yes you can reproduce the merge code you showed with a PROC SQL:

proc sql noprint;
  create table out.mydata as
  select *, 
     case thatdata.hkey
       when . then "Not_in_thisdata"
       else "Both"
     end
     as find
  from thisdata left join thatdata
  on thisdata.hkey=thatdata.hkey;
quit;

But why? If the data are already sorted, then the data step merge will be faster for large data sets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 08-03-2021 11:14 PM

>If (and only if), for each unique HKEY value either have a many:one or a one:many match (i.e. no many:many matches), then yes you can reproduce the merge code you showed with a PROC SQL

one:one is also a valid case for SQL to match a data step logic

Another condition for the SQL shown to identify missing data in the right table is that the key is never missing in the data.

> why? If the data are already sorted, then the data step merge will be faster for large data sets.

I agree in principle. If the sort is validated, the difference should be minimal though as SQL will not re-sort.

Sadly SAS forgot to be clever here and does not set this flag as it should except in a few cases.

High-Performance SAS Coding - Third Edition

GN0001 · Posted 08-04-2021 02:31 AM

Hi,

Do we need to bring below code in? Because left join bring all from left table and all the matches from both tables.

  case thatdata.hkey
       when . then "Not_in_thisdata"
       else "Both"
     end
     as find

Respectfully,

Blublue

Blue Blue

ChrisNZ · Posted 08-04-2021 02:38 AM

> Do we need to bring below code in?

The code you show will identify records present in the left table and not present in the right table, provided the data does not contain missing key values.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 08-03-2021 11:32 PM

All your questions about the different types of merge/join would be much faster, and much better, answered by your testing the syntaxes and examples you ask about. Doing is a much better way to learn than asking questions.

High-Performance SAS Coding - Third Edition

GN0001 · Posted 08-04-2021 02:32 AM

Hello,

I did test it, but I couldn't figure it out. I need an assurance.

Regards,

blueblue

Blue Blue

Kurt_Bremser · Posted 08-04-2021 03:34 AM

@GN0001 wrote:

Hello,

I did test it, but I couldn't figure it out. I need an assurance.

Regards,

blueblue

<SNARK>

If you tested it, and it worked, and you don't even trust your own test, then you need to pay a visit to a shrink.

</SNARK>

There is no better way in programming to verify a code than to test it. Trusting another person that took a casual glance is foolish at best.

If you expect a trustworthy answer, we would need to test your whole code against your whole data in your environment. And then we'd start charging you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

GN0001 · Posted 08-08-2021 02:49 PM

Hello team,

I tried and I could view the result of both.

I am clear on this one.

Thanks,

blueblue

Blue Blue

andreas_lds · Posted 08-09-2021 01:52 AM

Please mark the most helpful answer as solution.

andreas_lds · Posted 08-04-2021 02:59 AM

@GN0001 wrote:

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue
data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

So, you have a working data step, tested and giving the expected results. Why do you want to waste your time replacing it at all?

Registration is open

SAS Training: Just a Click Away