The reason you are not successful in using hash object to replicate the merge is primarily due to the fact that MERGE is a "match_merge" process. When you have a many-to-many match as you do in your example, then merge combines the first match from each dataset, then the second match from each dataset, etc.
When one of the datasets is shorter (as year2021_ww01 is here), then the last observation of that shorter sequence is matched with every "excess" obs in the longer sequence.
BTW, your merge suggests that the hash object should be for year2021_ww01, not product.
But even then your code would fail, because the hash object defaults to storing one dataitem (i.e. one "row") per key combination, so you don't even keep the entire sequence of obs from year2021_ww01 for any repeated ID/FILE_NM.
So you need to
accommodate more than one obs per key combination - use the multidata:"Y" option.
Match the first obs of the PRODUCT dataset with the first matching dataitem in the hash object, match the second with the second, etc., This is done by usnig H.FIND() method to find the first, then delete that dataitem (h.removeddup), so that the next dataitem will become the first to satisfy the next h.find().
Carry forth the last matching dataitem in the hash object to match any "extra" obs in PRODUCT. So don't h.removedup the last dataitem for the duplicate key
data want (drop=_:);
set product year2021_ww01 (obs=0);
if _n_=1 then do;
declare hash h (dataset:'year2021_ww01',multidata:'Y');
h.definekey('id','file_nm');
h.definedata(all:'Y');
h.definedone();
end;
if h.find()=0 then do;
h.has_next(result:_duplicate_available);
if _duplicate_available then h.removedup();
end;
run;
A couple of NOTES:
The above does not need either dataset to be sorted. Which, as far as I can tell, is the only real advantage of using hash vs MERGE in this particular application.
If there were key combinations in YEAR2021_WW01 that have a longer series than in PRODUCTS, then the above will NOT replicate the merge statement - because the merge statement would repeat the last matching obs in PRODUCTS to meet the longer series in YEAR2021_WW01, while this program would ignore extra obs in the YEAR2021_WW01 object. That would take a bit more code.
... View more