Hello @Schtroumpfette ,
No idea if you have found out meanwhile on how to proceed but here's an example on how to proceed with Hash Object Table Look-up in case of being confronted with a Cartesian product!
Suppose you have a look-up table 'LOOKUP_TABLE' of 20 000 records.
For every observation in the dataset 'HAVE', you want to scan each and every observation in the lookup table for a possible match.
You have no key-variable (no by-variable) to merge on.
Here's the classical (non-SQL) way of doing this kind of Cartesian product. It takes time!
data work.wanted;
set work.have;
do pointer = 1 to 20000;
set work.lookup_table point=pointer;
if whatever_condition_is_TRUE then output;
end;
run;
Here's how to do the same with a hash table look-up. Much faster! Especially for BIG datasets.
/* See: https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/271-2007.pdf */
/* See: https://support.sas.com/resources/papers/proceedings16/10200-2016.pdf */
data work.wanted(drop=rc);
if _N_=1 then do;
declare hash h(dataset: "work.lookup_table", ordered: "A", multidata: "Y");
h.definekey ("key");
h.definedata ('var_1','var_2');
h.definedone();
call missing(var_1, var_2);
end;
set work.have;
do rc = h.find() by 0 while (rc = 0) ;
if whatever_condition_is_TRUE then output;
rc = h.find_next() ;
end ;
run;
You don't have a key variable, but the hash table requires a key-variable. Therefore, make a key variable in both of your datasets and give the key a constant value for every observation, sthg. like:
key='k';
Joining on such a key is the ultimate nxn merge (Cartesian product).
Make sure your datasets 'have' and 'lookup_table' have no variables with the same name. Do a (rename=()) if needed.
I hope you can translate this to your own concrete situation.
Cheers,
Koen
... View more