BookmarkSubscribeRSS Feed
GN0001
Barite | Level 11

Hello team,

Would would be the context left and right join in SAS without using in proc sql?

Regards,

Blue blue

Blue Blue
8 REPLIES 8
GN0001
Barite | Level 11

Thanks for sending the link.

It is very hard to understand from this document. I have read it several times!

Regards,

blueblue

Blue Blue
ballardw
Super User

You add dataset option IN= to each data set to identify if the current observation comes from the desired set. The select for output those that you want.

 

Dummy code:

Data want;
   merge data1 (in=inone)
              data2 (in=intwo)
   ;
   by keyvariable;
  /*in a given data step likely only want one of the following
    for most cases.
  */
  /* select records where data1 contributed*/
  if inone;
  /* or select records where data2 contributed*/
  if intwo;
  /* or select records where both data1 and data2*/
  if inone and intwo;
  /* select records from data1 that had no match in data2*/
  if inone and not intwo;
run;

The IN variables are basically boolean: 1/0 for Yes/no so can be used in if statements. (Or summed to identify how many data sets contributed to given record since a MERGE statement can have many data sets ).

However the Merge is still not the same as a "join on" when by values are duplicated in both sets.

Kurt_Bremser
Super User

Basically there isn't, because SQL works different when a many-to-many relationship is encountered.

A DATA step can only replicate a left or right join when there is either a one-to-one, one-to-many or many-to-one relationship.

Once that is verified, read the documentation of the MERGE Statement 

mkeintz
PROC Star

As @Kurt_Bremser pointed out, if you have a 1:many, many:1, or 1:1 for every key value, then a MERGE statement can easily work like a left join.

 

But many-to-many takes more effort, and use of a hash object.

 


data want;

  set right (in=inR)  left (in=inL);
  by key;

  if _n_=1 then do;
    declare hash r (dataset:'right (obs=0)',multidata:'Y');
      r.definekey('key');
      r.definedata(all:'Y');
      r.definedone();
  end;
  if first.key then r.clear();

  if inR then r.add();
  if inL ;


  if r.num_items=0 then output;
  else do rc=r.find() by 0 until (r.find_next()^=0);
    output;
  end;
  call missing(of _all_);
run;

Basically, for each KEY, all the right-side observations are read first and loaded into a hash object in memory.  Then for each subsequent left observation, the program iterates through the hash object, retrieving values from the right-side and outputting for each iteration.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
GN0001
Barite | Level 11

Hello,

Basically, for each KEY, all the right-side observations are read first and loaded into a hash object in memory.  Then for each subsequent left observation, the program iterates through the hash object, retrieving values from the right-side and outputting for each iteration.

 

Does this 

set right (in=inR)  left (in=inL);
  by key;

determine which table should be read first?

If we had:

set Left (in=inL) right(in=inR)

Is the top reads the right table first even if it is the second table in a row?

Respectfully,

blueblue

Blue Blue
mkeintz
PROC Star

@GN0001 wrote:

Hello,

Basically, for each KEY, all the right-side observations are read first and loaded into a hash object in memory.  Then for each subsequent left observation, the program iterates through the hash object, retrieving values from the right-side and outputting for each iteration.

 

Does this 

set right (in=inR)  left (in=inL);
  by key;

determine which table should be read first?

If we had:

set Left (in=inL) right(in=inR)

Is the top reads the right table first even if it is the second table in a row?

Respectfully,

blueblue


 

 

This is one of those questions that can well be answered by

  1. Examining the Documentation on the SET statement 
      Take notice of examples 1: concatenating data sets, and 2: interleaving data sets.

    and/or

  2. Running some tests - one of the great features of learning any programming language or programming tool like SAS.

 

More generally, since you want to do a left join, you first want to load the RIGHT data into memory, so that it can be repeatedly and inexpensively looped over for each su7bsequent matching LEFT observation.  This in turn means you have to read the RIGHT prior to the LEFT, in order to put it in memory.  Thus "set right left".  The opposite ("set left right") would yield NO data from RIGHT, since it isn't in memory when processing the LEFT dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 669 views
  • 5 likes
  • 5 in conversation