BookmarkSubscribeRSS Feed
GN0001
Barite | Level 11

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue

data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;
Blue Blue
10 REPLIES 10
mkeintz
PROC Star

@GN0001 wrote:

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue

data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

 

If (and only if), for each unique HKEY value either have a one:one, many:one or a one:many match (i.e. no many:many matches), then yes you can reproduce the merge code you showed with a PROC SQL: 

proc sql noprint;
  create table out.mydata as
  select *, 
     case thatdata.hkey
       when . then "Not_in_thisdata"
       else "Both"
     end
     as find
  from thisdata left join thatdata
  on thisdata.hkey=thatdata.hkey;
quit;

But why?  If the data are already sorted, then the data step merge will be faster for large data sets.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

>If (and only if), for each unique HKEY value either have a many:one or a one:many match (i.e. no many:many matches), then yes you can reproduce the merge code you showed with a PROC SQL

one:one is also a valid case for SQL to match a data step logic

Another condition for the SQL shown to identify missing data in the right table is that the key is never missing in the data.

 

> why?  If the data are already sorted, then the data step merge will be faster for large data sets.

I agree in principle. If the sort is validated, the difference should be minimal though as SQL will not re-sort. 

Sadly SAS forgot to be clever here and does not set this flag as it should except in a few cases.

GN0001
Barite | Level 11

Hi,

Do we need to bring below code in? Because left join bring all from left table and all the matches from both tables.

  case thatdata.hkey
       when . then "Not_in_thisdata"
       else "Both"
     end
     as find

 Respectfully,

Blublue

Blue Blue
ChrisNZ
Tourmaline | Level 20

> Do we need to bring below code in?

The code you show will identify records present in the left table and not present in the right table, provided the data does not contain missing key values.

ChrisNZ
Tourmaline | Level 20

All your questions about the different types of merge/join would be much faster, and much better, answered by your testing the syntaxes and examples you ask about. Doing is a much better way to learn than asking questions.

GN0001
Barite | Level 11

Hello,

I did test it, but I couldn't figure it out. I need an assurance.

Regards,

blueblue

Blue Blue
Kurt_Bremser
Super User

@GN0001 wrote:

Hello,

I did test it, but I couldn't figure it out. I need an assurance.

Regards,

blueblue


<SNARK>

If you tested it, and it worked, and you don't even trust your own test, then you need to pay a visit to a shrink.

</SNARK>

 

There is no better way in programming to verify a code than to test it. Trusting another person that took a casual glance is foolish at best.

 

If you expect a trustworthy answer, we would need to test your whole code against your whole data in your environment. And then we'd start charging you.

GN0001
Barite | Level 11

Hello team,

 

I tried and I could view the result of both.

 

I am clear on this one.

 

Thanks,

blueblue

Blue Blue
andreas_lds
Jade | Level 19

Please mark the most helpful answer as solution.

andreas_lds
Jade | Level 19

@GN0001 wrote:

Hello team,

Can we use a proc sql and left join for this code below?

Regards,

blueblue

data out.mydata;
merge thisdata(in=a) thatdata(in=b);
by hkey;
format find $15.;
if a and b then Find = 'Both';
if a and not b then Find = 'not_in_thisdata'
if a then output;
run;

So, you have a working data step, tested and giving the expected results. Why do you want to waste your time replacing it at all?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2106 views
  • 7 likes
  • 5 in conversation