DATA Step, Macro, Functions and more

many-to-many left outer join with hash table question

Reply
Contributor
Posts: 22

many-to-many left outer join with hash table question

I was trying to do a many-to-many hash join. 

 

use sashelp.class as base table, I added extra row to get a data set with duplicated Name:

data work.class;
set sashelp.class end=eof;
if eof then do;
  Name = 'Alice'  ;
  Sex='F'  ;
  age=20  ;
  height=60  ;
  weight=159 ;
end;
run;

 

Then, I created a lookup table:

data work.lkup;
input name $ Var1 Var2;
datalines;
Alice  12  54
Jack   13  56
James  4   6
Alice  37  25
;
run;

 

I managed to created a left join by using the following code:

data work.test;
set work.class;

if _N_ =1 then do;
  if 0 then set lkup;
  declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y');
  h.defineKey('name');;
  h.defineData(all:'y');
  h.defineDone();
end;

if h.find() then do;
  call missing (of _all_);
output;
end;
if not h.find();output;
if not h.find_next();output;

run;

 

but the bolded call missing statement clear out all of the variable, both data from class and lkup. 

 

So instead of manually using the following code to clear out non-matched results

call missing (var1 var2);

is there any other way I can call missing var1 and var2 easily? (I am trying to apply the logic by using a lookup table with lots of variables.)

 

 

Thanks.

Contributor
Posts: 22

Re: many-to-many left outer join with hash table question

I reckon I can add an array for all the variables from lkup and then call missing using the array. 

array lkup {*} var1 -- var3;
...
call missing (of lkup {*});
...


However, this is not ideal as I still have to define the first and last variables manually.  Better solutions are welcome.

 

Super User
Posts: 5,516

Re: many-to-many left outer join with hash table question

[ Edited ]

Technically, this will not work anyway since you can't mix numeric and character variables in the same array.  However, you can define two arrays, as long as you do it in the proper spot within the program:

 

array nums {*} _numeric_;

array chars {*} _character_;

 

The array statements must be placed early in the program.  Just before the initial SET statement, you can insert:

 

if 5=4 then do;.  

   set lkup;

   ** both array statements;

end;

 

That way, their definition will be limited to the variables in the lookup data set.  You will need two call missings later (one for nums{*} and one for chars{*}), and your program may still fail for a data set that contains only one type of variable.

Contributor
Posts: 22

Re: many-to-many left outer join with hash table question

I also noticed that the

if not h.find_next(); output;

only return the 2nd result (if any) then, the 3rd result will be missed (if any);

what's the solution to this?
PROC Star
Posts: 283

Re: many-to-many left outer join with hash table question

[ Edited ]

This should meet your req:

 

data work.test;

set work.class;

if _N_ =1 then do;

if 0 then  set lkup;

declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y');

h.defineKey('name');;

h.defineData(all:'y');

h.defineDone();

end;

rc = h.find();

if rc ne 0 then output;

if (rc = 0) then do;

output;

rc = h.find_next();

do while(rc = 0);

output;

rc = h.find_next();

end;

call missing(of v: );

end;

drop rc;

run;

 

 

Trusted Advisor
Posts: 1,022

Re: many-to-many left outer join with hash table question

The technique is to take advantage of how SAS builds the program data vector.  In the program below, the italic-bold syntax assures that variables from LKUP are between variables _pre and _post.  Then the modified call missing works as you apparently intend.

 

Note: any var that is on BOTH class and LKUP will not be set to missing, since its position in the PDV will be to the left of _pre.

 

 

data work.test (drop=_pre _post);
set work.class;

 

retain _pre .;
if _N_ =1 then do;
  if 0 then set lkup;
  declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y');
  h.defineKey('name');;
  h.defineData(all:'y');
  h.defineDone();
end;

retain _post .;

if h.find() then do;
  call missing (of _pre -- _post);
output;
end;
if not h.find();output;
if not h.find_next();output;

run;

Trusted Advisor
Posts: 1,022

Re: many-to-many left outer join with hash table question

In addition to my suggestion of building the program data vector to make call missing(of _pre -- _post) work as intended, I suggest you modify the code below,

 

from

      if h.find() then do;
        call missing (of _pre--_post_);

        output;
      end;
      if not h.find();output;
      if not h.find_next();output;

 

to

      call missing (of _pre--_post_);

      rc=h.find();

      output;

      do rc=h.find_next() by 0 while (rc=0);

         output;

         rc=h.find_next();

      end;

         
    It reduces the number of explicit output statments, and it accomodates cases in which a key value occurs more than twice.

Ask a Question
Discussion stats
  • 6 replies
  • 149 views
  • 0 likes
  • 4 in conversation