BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
IgorR
Quartz | Level 8

Hi!

I recently discovered the use of Hash Object. This is fast and easy to use instrument.

But I'm straggling to find out how to Left Join two tables.

Previous topics provide solution to the specific problems, but doesn't provide clear explanation what and how they doing this.

Here is my attempt to make left join but it doesn't work well - there are values in places that supposed to have NULL.

Please, help me to modify the code to make the correct Left join with missing values.

 

Data Tviot_2;
If 0 Then Set Tviot_1
			  Tom_Tkufat_Bituach_Taarif
			  Tashlum_24
			  Schum_Tvia_Magash;

If _N_ = 1 Then Do;
	Declare Hash TTB(Dataset: "Tom_Tkufat_Bituach_Taarif");
	TTB.DefineKey("MS_POL");
	TTB.DefineData("TR_HATCHLAT_BITUACH","TR_PRODUKTZIA","TR_TOM_BITUACH");
	TTB.DefineDone();

	Declare Hash TSHL24(Dataset: "Tashlum_24");
	TSHL24.DefineKey("MS_TVIA");
	TSHL24.DefineData("MONE_TASHLUMIM","MS_TASHLUM_24","TRD_TASHLUM_24_KAVUA");
	TSHL24.DefineDone();

	Declare Hash MGSH(Dataset: "Schum_Tvia_Magash");
	MGSH.DefineKey("MS_TVIA");
	MGSH.DefineData("Schum_Tvia_Magash","TKUFAT_BITUACH");
	MGSH.DefineDone();
End;

Set Tviot_1;
	If TTB.Find()    = 0
		OR TSHL24.Find() = 0
		OR MGSH.Find()   = 0
	Then Output;
	Else Output;

Run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Variables sourced from input datasets are always "retain"ed.  

So you need to either clear them to missing before you read the "LEFT" table.

if 0 then set LEFT RIGHT1 RIGHT2 ...;
call missing(of _all_);
set left;

Or clear them to missing when the FIND() method fails.  Since FIND() returns a TRUE value when the value is NOT found the code would look like this:

if RIGHT1.find() then call missing(TR_HATCHLAT_BITUACH,TR_PRODUKTZIA,TR_TOM_BITUACH);
if RIGHT2.find() then call missing(....);

So the generalized pattern might look like this:

data want;
  if 0 then set left
    right1(keep=r1data1 r1data2)
    right2(keep=r1data2 r2data2)
  ;
  set left;
  if _n_=1 then do;
    declare hash right1(dataset:'right1');
    right1.definekey('key1');
    right1.definedata('r1data1','r1data2');
    right1.definedone();
    declare hash right2(dataset:'right2');
    right2.definekey('key2');
    right2.definedata('r2data1','r2data2');
    right2.definedone();
  end;
  if right1.find() then call missing(r1data1,r1data2);
  if right2.find() then call missing(r2data1,r2data2);
run;

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

Given your code I'm making the following assumptions of what you have and want

1. Your left table where you want to keep all rows is the one in the set statement - Tviot_1

2. The relationship of your left table to all hash tables is one to zero or one - given you didn't use hash option multidata:'y' that's what it must be

 

You don't need to call hash methods as part of an expression.

If you define the variables in the hash to the PDV via IF 0 then set <lookup tables>; then you must set these variables to missing at the beginning of the data step as else the value from the last successful lookup will get retained.

 

Below amended code will return a left join.

data tviot_2;
  if _n_ = 1 then
    do;
      if 0 then set tviot_1 tom_tkufat_bituach_taarif tashlum_24 schum_tvia_magash;
      declare hash ttb(dataset: "tom_tkufat_bituach_taarif");
      ttb.definekey("ms_pol");
      ttb.definedata("tr_hatchlat_bituach","tr_produktzia","tr_tom_bituach");
      ttb.definedone();

      declare hash tshl24(dataset: "tashlum_24");
      tshl24.definekey("ms_tvia");
      tshl24.definedata("mone_tashlumim","ms_tashlum_24","trd_tashlum_24_kavua");
      tshl24.definedone();

      declare hash mgsh(dataset: "schum_tvia_magash");
      mgsh.definekey("ms_tvia");
      mgsh.definedata("schum_tvia_magash","tkufat_bituach");
      mgsh.definedone();
    end;
  call missing(of _all_);

  set tviot_1;

  _rc=ttb.find();
  _rc=tshl24.find();
  _rc=mgsh.find();

  /* if you want an IF condition that's always true: Just add 1 to the expression (1: True, 0:False*/
  /* if 1 or ttb.find() or tshl24.find() or mgsh.find(); */

  drop _rc;
run;

 

s_lassen
Meteorite | Level 14

The problem is that the variables fetched from the hash tables are retained.

 

You can solve the problem by setting the data variables missing, e.g.:

Data Tviot_2;
If 0 Then Set Tviot_1
			  Tom_Tkufat_Bituach_Taarif
			  Tashlum_24
			  Schum_Tvia_Magash;

If _N_ = 1 Then Do;
	Declare Hash TTB(Dataset: "Tom_Tkufat_Bituach_Taarif");
	TTB.DefineKey("MS_POL");
	TTB.DefineData("TR_HATCHLAT_BITUACH","TR_PRODUKTZIA","TR_TOM_BITUACH");
	TTB.DefineDone();

	Declare Hash TSHL24(Dataset: "Tashlum_24");
	TSHL24.DefineKey("MS_TVIA");
	TSHL24.DefineData("MONE_TASHLUMIM","MS_TASHLUM_24","TRD_TASHLUM_24_KAVUA");
	TSHL24.DefineDone();

	Declare Hash MGSH(Dataset: "Schum_Tvia_Magash");
	MGSH.DefineKey("MS_TVIA");
	MGSH.DefineData("Schum_Tvia_Magash","TKUFAT_BITUACH");
	MGSH.DefineDone();
End;

Set Tviot_1;
call missing(of TR_HATCHLAT_BITUACH--TKUFAT_BITUACH);
	If TTB.Find()    = 0
		OR TSHL24.Find() = 0
		OR MGSH.Find()   = 0
	Then Output;
	Else Output;

Run;

 Edit note: added "of" in CALL MSÍSSING.

Tom
Super User Tom
Super User

Variables sourced from input datasets are always "retain"ed.  

So you need to either clear them to missing before you read the "LEFT" table.

if 0 then set LEFT RIGHT1 RIGHT2 ...;
call missing(of _all_);
set left;

Or clear them to missing when the FIND() method fails.  Since FIND() returns a TRUE value when the value is NOT found the code would look like this:

if RIGHT1.find() then call missing(TR_HATCHLAT_BITUACH,TR_PRODUKTZIA,TR_TOM_BITUACH);
if RIGHT2.find() then call missing(....);

So the generalized pattern might look like this:

data want;
  if 0 then set left
    right1(keep=r1data1 r1data2)
    right2(keep=r1data2 r2data2)
  ;
  set left;
  if _n_=1 then do;
    declare hash right1(dataset:'right1');
    right1.definekey('key1');
    right1.definedata('r1data1','r1data2');
    right1.definedone();
    declare hash right2(dataset:'right2');
    right2.definekey('key2');
    right2.definedata('r2data1','r2data2');
    right2.definedone();
  end;
  if right1.find() then call missing(r1data1,r1data2);
  if right2.find() then call missing(r2data1,r2data2);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 531 views
  • 7 likes
  • 4 in conversation