The behavior of the hash left join using the _N_=0 trick does produce unexpected results.
data hleftjoin;
IF _N_=0 THEN SET RightData;
IF _N_=1 THEN DO;
declare hash lookupdata;
lookupdata = _new_ hash(dataset:"RightData", ordered:"yes");
lookupdata.defineKey('Key1','Key2');
lookupdata.defineData('leftCol1','leftCol2');
lookupdata.defineDone();
END;
SET LEftData end=EOF_daily;
rc = lookupdata.find(KEY: Key1, KEY: Key2);
RUN;
produces for Key1 and Key2 data in LeftData with a rc value greater than 0 (ie not found in RightData) non-missing leftCol1 and LeftCol2 data. It seems to put the last leftCol1 &2 values in it. This can be easyly taken care of with a IF rc ne 0 THEN Call missing(LeftCol1, LeftCol2). It doesn't help to put a CALL MISSING right after the hash declaration (_N_=1).
However this doesn't occur when instead of the _N_=0 trick you use the LENGTH statement (LENGHT LeftCol1 LeftCol2 😎 or FORMAT statement.
I cannot make sense of it and this is error is not easly found.
That's a characteristic of DATA steps in general, not of hashing in particular.
Any variable that comes from a SAS data set is automatically retained. Doesn't matter whether the SAS data set is mentioned in a SET vs. MERGE vs. UPDATE statement ... if the source is a SAS data set, the variable is retained.
So when the SET statement appears, all variables from RightData are automatically retained. The rest of it (when it happens, how to handle) it looks like you have figured out.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.