Hi!
I recently discovered the use of Hash Object. This is fast and easy to use instrument.
But I'm straggling to find out how to Left Join two tables.
Previous topics provide solution to the specific problems, but doesn't provide clear explanation what and how they doing this.
Here is my attempt to make left join but it doesn't work well - there are values in places that supposed to have NULL.
Please, help me to modify the code to make the correct Left join with missing values.
Data Tviot_2;
If 0 Then Set Tviot_1
Tom_Tkufat_Bituach_Taarif
Tashlum_24
Schum_Tvia_Magash;
If _N_ = 1 Then Do;
Declare Hash TTB(Dataset: "Tom_Tkufat_Bituach_Taarif");
TTB.DefineKey("MS_POL");
TTB.DefineData("TR_HATCHLAT_BITUACH","TR_PRODUKTZIA","TR_TOM_BITUACH");
TTB.DefineDone();
Declare Hash TSHL24(Dataset: "Tashlum_24");
TSHL24.DefineKey("MS_TVIA");
TSHL24.DefineData("MONE_TASHLUMIM","MS_TASHLUM_24","TRD_TASHLUM_24_KAVUA");
TSHL24.DefineDone();
Declare Hash MGSH(Dataset: "Schum_Tvia_Magash");
MGSH.DefineKey("MS_TVIA");
MGSH.DefineData("Schum_Tvia_Magash","TKUFAT_BITUACH");
MGSH.DefineDone();
End;
Set Tviot_1;
If TTB.Find() = 0
OR TSHL24.Find() = 0
OR MGSH.Find() = 0
Then Output;
Else Output;
Run;
Variables sourced from input datasets are always "retain"ed.
So you need to either clear them to missing before you read the "LEFT" table.
if 0 then set LEFT RIGHT1 RIGHT2 ...;
call missing(of _all_);
set left;
Or clear them to missing when the FIND() method fails. Since FIND() returns a TRUE value when the value is NOT found the code would look like this:
if RIGHT1.find() then call missing(TR_HATCHLAT_BITUACH,TR_PRODUKTZIA,TR_TOM_BITUACH);
if RIGHT2.find() then call missing(....);
So the generalized pattern might look like this:
data want;
if 0 then set left
right1(keep=r1data1 r1data2)
right2(keep=r1data2 r2data2)
;
set left;
if _n_=1 then do;
declare hash right1(dataset:'right1');
right1.definekey('key1');
right1.definedata('r1data1','r1data2');
right1.definedone();
declare hash right2(dataset:'right2');
right2.definekey('key2');
right2.definedata('r2data1','r2data2');
right2.definedone();
end;
if right1.find() then call missing(r1data1,r1data2);
if right2.find() then call missing(r2data1,r2data2);
run;
Given your code I'm making the following assumptions of what you have and want
1. Your left table where you want to keep all rows is the one in the set statement - Tviot_1
2. The relationship of your left table to all hash tables is one to zero or one - given you didn't use hash option multidata:'y' that's what it must be
You don't need to call hash methods as part of an expression.
If you define the variables in the hash to the PDV via IF 0 then set <lookup tables>; then you must set these variables to missing at the beginning of the data step as else the value from the last successful lookup will get retained.
Below amended code will return a left join.
data tviot_2;
if _n_ = 1 then
do;
if 0 then set tviot_1 tom_tkufat_bituach_taarif tashlum_24 schum_tvia_magash;
declare hash ttb(dataset: "tom_tkufat_bituach_taarif");
ttb.definekey("ms_pol");
ttb.definedata("tr_hatchlat_bituach","tr_produktzia","tr_tom_bituach");
ttb.definedone();
declare hash tshl24(dataset: "tashlum_24");
tshl24.definekey("ms_tvia");
tshl24.definedata("mone_tashlumim","ms_tashlum_24","trd_tashlum_24_kavua");
tshl24.definedone();
declare hash mgsh(dataset: "schum_tvia_magash");
mgsh.definekey("ms_tvia");
mgsh.definedata("schum_tvia_magash","tkufat_bituach");
mgsh.definedone();
end;
call missing(of _all_);
set tviot_1;
_rc=ttb.find();
_rc=tshl24.find();
_rc=mgsh.find();
/* if you want an IF condition that's always true: Just add 1 to the expression (1: True, 0:False*/
/* if 1 or ttb.find() or tshl24.find() or mgsh.find(); */
drop _rc;
run;
The problem is that the variables fetched from the hash tables are retained.
You can solve the problem by setting the data variables missing, e.g.:
Data Tviot_2;
If 0 Then Set Tviot_1
Tom_Tkufat_Bituach_Taarif
Tashlum_24
Schum_Tvia_Magash;
If _N_ = 1 Then Do;
Declare Hash TTB(Dataset: "Tom_Tkufat_Bituach_Taarif");
TTB.DefineKey("MS_POL");
TTB.DefineData("TR_HATCHLAT_BITUACH","TR_PRODUKTZIA","TR_TOM_BITUACH");
TTB.DefineDone();
Declare Hash TSHL24(Dataset: "Tashlum_24");
TSHL24.DefineKey("MS_TVIA");
TSHL24.DefineData("MONE_TASHLUMIM","MS_TASHLUM_24","TRD_TASHLUM_24_KAVUA");
TSHL24.DefineDone();
Declare Hash MGSH(Dataset: "Schum_Tvia_Magash");
MGSH.DefineKey("MS_TVIA");
MGSH.DefineData("Schum_Tvia_Magash","TKUFAT_BITUACH");
MGSH.DefineDone();
End;
Set Tviot_1;
call missing(of TR_HATCHLAT_BITUACH--TKUFAT_BITUACH);
If TTB.Find() = 0
OR TSHL24.Find() = 0
OR MGSH.Find() = 0
Then Output;
Else Output;
Run;
Edit note: added "of" in CALL MSÍSSING.
Variables sourced from input datasets are always "retain"ed.
So you need to either clear them to missing before you read the "LEFT" table.
if 0 then set LEFT RIGHT1 RIGHT2 ...;
call missing(of _all_);
set left;
Or clear them to missing when the FIND() method fails. Since FIND() returns a TRUE value when the value is NOT found the code would look like this:
if RIGHT1.find() then call missing(TR_HATCHLAT_BITUACH,TR_PRODUKTZIA,TR_TOM_BITUACH);
if RIGHT2.find() then call missing(....);
So the generalized pattern might look like this:
data want;
if 0 then set left
right1(keep=r1data1 r1data2)
right2(keep=r1data2 r2data2)
;
set left;
if _n_=1 then do;
declare hash right1(dataset:'right1');
right1.definekey('key1');
right1.definedata('r1data1','r1data2');
right1.definedone();
declare hash right2(dataset:'right2');
right2.definekey('key2');
right2.definedata('r2data1','r2data2');
right2.definedone();
end;
if right1.find() then call missing(r1data1,r1data2);
if right2.find() then call missing(r2data1,r2data2);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.