I was trying to do a many-to-many hash join.
use sashelp.class as base table, I added extra row to get a data set with duplicated Name:
data work.class; set sashelp.class end=eof; if eof then do; Name = 'Alice' ; Sex='F' ; age=20 ; height=60 ; weight=159 ; end; run;
Then, I created a lookup table:
data work.lkup; input name $ Var1 Var2; datalines; Alice 12 54 Jack 13 56 James 4 6 Alice 37 25 ; run;
I managed to created a left join by using the following code:
data work.test; set work.class; if _N_ =1 then do; if 0 then set lkup; declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y'); h.defineKey('name');; h.defineData(all:'y'); h.defineDone(); end; if h.find() then do; call missing (of _all_); output; end; if not h.find();output; if not h.find_next();output; run;
but the bolded call missing statement clear out all of the variable, both data from class and lkup.
So instead of manually using the following code to clear out non-matched results
call missing (var1 var2);
is there any other way I can call missing var1 and var2 easily? (I am trying to apply the logic by using a lookup table with lots of variables.)
Thanks.
I reckon I can add an array for all the variables from lkup and then call missing using the array.
array lkup {*} var1 -- var3; ... call missing (of lkup {*}); ...
However, this is not ideal as I still have to define the first and last variables manually. Better solutions are welcome.
Technically, this will not work anyway since you can't mix numeric and character variables in the same array. However, you can define two arrays, as long as you do it in the proper spot within the program:
array nums {*} _numeric_;
array chars {*} _character_;
The array statements must be placed early in the program. Just before the initial SET statement, you can insert:
if 5=4 then do;.
set lkup;
** both array statements;
end;
That way, their definition will be limited to the variables in the lookup data set. You will need two call missings later (one for nums{*} and one for chars{*}), and your program may still fail for a data set that contains only one type of variable.
This should meet your req:
data work.test;
set work.class;
if _N_ =1 then do;
if 0 then set lkup;
declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y');
h.defineKey('name');;
h.defineData(all:'y');
h.defineDone();
end;
rc = h.find();
if rc ne 0 then output;
if (rc = 0) then do;
output;
rc = h.find_next();
do while(rc = 0);
output;
rc = h.find_next();
end;
call missing(of v: );
end;
drop rc;
run;
The technique is to take advantage of how SAS builds the program data vector. In the program below, the italic-bold syntax assures that variables from LKUP are between variables _pre and _post. Then the modified call missing works as you apparently intend.
Note: any var that is on BOTH class and LKUP will not be set to missing, since its position in the PDV will be to the left of _pre.
data work.test (drop=_pre _post);
set work.class;
retain _pre .;
if _N_ =1 then do;
if 0 then set lkup;
declare hash h(dataset: 'lkup', hashExp:16 , multidata: 'y');
h.defineKey('name');;
h.defineData(all:'y');
h.defineDone();
end;
retain _post .;
if h.find() then do;
call missing (of _pre -- _post);
output;
end;
if not h.find();output;
if not h.find_next();output;
run;
In addition to my suggestion of building the program data vector to make call missing(of _pre -- _post) work as intended, I suggest you modify the code below,
from
if h.find() then do;
call missing (of _pre--_post_);
output;
end;
if not h.find();output;
if not h.find_next();output;
to
call missing (of _pre--_post_);
rc=h.find();
output;
do rc=h.find_next() by 0 while (rc=0);
output;
rc=h.find_next();
end;
It reduces the number of explicit output statments, and it accomodates cases in which a key value occurs more than twice.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.