BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

I would like  to ask please -

In each of the 3 ways of solution via Hash method - Which code is telling SAS to perform LEFT JOIN?

IF I want to perform INNER JOIN-Which code need to be modified ?

IF I want to perform FULL JOIN-which code need to be modified? 

Data ABT_ALL;
input lak_id snif X y Z;
cards;
111111 987 10 20 30
222222 921 15 30 45
333333 940 20 40 60
;
Run;

Data VBM374_USED_BRANCH_CUSTOMER;
input Branch_Cust_Nbr Branch_Nbr first_Branch_Cust_IP;
cards;
222222 921 123 
333333 940 456
;
run;

/************Left join*****************************************/
/************Left join*****************************************/
/************Left join*****************************************/
data WAY1;
set ABT_ALL;
if _n_ = 1 then do;
declare hash h (dataset:"VBM374_USED_BRANCH_CUSTOMER
                       (rename=(Branch_Cust_Nbr=lak_id Branch_Nbr=snif
                        first_Branch_Cust_IP=Numerator_DWH))");
h.definekey("lak_id","snif"); 
h.definedata("Numerator_DWH");
h.definedone();
end;
if h.find() then Numerator_DWH=.;
run;


data WAY2;
set ABT_ALL;
if _n_ = 1
then do;
declare hash h (dataset:"VBM374_USED_BRANCH_CUSTOMER
                       (rename=(Branch_Cust_Nbr=lak_id Branch_Nbr=snif
                        first_Branch_Cust_IP=Numerator_DWH))");
h.definekey("lak_id","snif"); 
h.definedata("Numerator_DWH");
h.definedone();
if 0 then set VBM374_USED_BRANCH_CUSTOMER(rename=(Branch_Cust_Nbr=lak_id Branch_Nbr=snif first_Branch_Cust_IP=Numerator_DWH));
call missing(Numerator_DWH);
end;
rc = h.find();
drop rc;
run;



data WAY3;
set ABT_ALL;
if 0 then set VBM374_USED_BRANCH_CUSTOMER(keep=Branch_Cust_Nbr Branch_Nbr first_Branch_Cust_IP);
if _n_ = 1 then do;
declare hash h (dataset:"VBM374_USED_BRANCH_CUSTOMER");
h.definekey("Branch_Cust_Nbr","Branch_Nbr"); 
h.definedata("first_Branch_Cust_IP");
h.definedone();
end;
if not h.find(key:lak_id,key:snif) then Numerator_DWH=first_Branch_Cust_IP;
drop Branch_Cust_Nbr Branch_Nbr first_Branch_Cust_IP ;
run;
Tom
Super User Tom
Super User

Your original problem was doing a LOOKUP.  Not really any type of "JOIN".  If it wasn't for the fact that you have two key variables instead of one then you could have done that by making a FORMAT or an INFORMAT instead.

 

None of the data steps you posted are doing what SQL calls a JOIN.  They do not handle the situation where both datasets could have multiple observations for the same set of key values.  But as long as the small dataset (the one you put into the hash) has unique observations then it is doing LEFT (or RIGHT what is the difference really) JOIN since you keep all of the observations from the main dataset and just add on the values extracted from the dataset that is stored in the hash.

 

To handle multiple observations for the same key values you would have to define the hash to allow that.  And then make multple calls to the .FIND() method until you no longer retrieve another observation.  So you would need to add a loop and output statement(s).

Tom
Super User Tom
Super User

That middle one does not need the IF 0 THEN SET statement.

You are reading in only 3 variables.  2 of them are the KEY variables so those are already defined by the SET for the non-hashed dataset.  And the third is a numeric variable, so your  use of it in the CALL MISSING() statement will define it already.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 4837 views
  • 7 likes
  • 5 in conversation