I am trying to apply some code presented by Kenneth W. Borowiak at NESUG 2006 to see if I can use HASH more efficiently to avoid a memory issue. As Kenneth mentions in his paper, the idea was presented by Paul Dorfman and Lessia Shajenko in a 2006 paper.
Both these papers assume that the data for the hash is available for loading into the hash - something like the below (taken from Kenneth's
do until(eof_SubsetMe) ;
set SubsetMe end=eof_SubsetMe ;
In my case, I dont have the data beforehand. I want my hash table to be loaded whenever the data is not found in the hash. My modified code is pasted below. When I run the code, it seems to get the value of 'n'
correctly, but it seems to always point to the last record in maps.&plan._UPD_UID_HEAD_PHM_XWalk.
Could there be a conflict between the datastep pointer and the direct access pointer? Once the "maps.&plan._UPD_UID_HEAD_PHM_XWalk" dataset is set, cannot it be updated (by the output statement)? And if yes, will it be re "SET"? I feel I am missing something fundamental about how the datastep works. Any help here is greatly appreciated.
My apologies in advance if I am not including any information that might be helpful. But please let me know if you need additional information.
The answer to your fundamental question about "dynamic update" with a SAS dataset using OUTPUT and then "referencing that same SAS dataset dynamically in the same DATA step" is no. The SAS datasets mentioned on the DATA statement are not the same physical file that you are referencing with the SET statement, until the DATA step completes and the permanent copy (on the DATA statement) is replaced.
You will need to consider pre-processing your new input data file, creating a suitable "interim master file" for additional processing. Options to consider are a hash table, a PROC FORMAT with a PUT function look-up, and also as you demonstrated a SET with a KEY= approach to find a suitable match-condition.
I have updated my process to do something like what Scott suggested above. Basically, I moved the "if hj.find() = 0" do block to a different data step which follows the above one. Now my process works the way I intended it to.
But I will look into the Modify statement to see if I can use it, because then I can avoid reading the two datasets again. I need to study if direct accessing works with Modify and probably other things that may cause issues.