DATA Step, Macro, Functions and more

SET and OUTPUT same dataset within Datastep - An Issue?

Reply
N/A
Posts: 0

SET and OUTPUT same dataset within Datastep - An Issue?

Hello,

I am trying to apply some code presented by Kenneth W. Borowiak at NESUG 2006 to see if I can use HASH more efficiently to avoid a memory issue. As Kenneth mentions in his paper, the idea was presented by Paul Dorfman and Lessia Shajenko in a 2006 paper.

Both these papers assume that the data for the hash is available for loading into the hash - something like the below (taken from Kenneth's
paper):

do until(eof_SubsetMe) ;
set SubsetMe end=eof_SubsetMe ;
n+1 ;
Sub.add() ;
end;

In my case, I dont have the data beforehand. I want my hash table to be loaded whenever the data is not found in the hash. My modified code is pasted below. When I run the code, it seems to get the value of 'n'
correctly, but it seems to always point to the last record in maps.&plan._UPD_UID_HEAD_PHM_XWalk.

Could there be a conflict between the datastep pointer and the direct access pointer? Once the "maps.&plan._UPD_UID_HEAD_PHM_XWalk" dataset is set, cannot it be updated (by the output statement)? And if yes, will it be re "SET"? I feel I am missing something fundamental about how the datastep works. Any help here is greatly appreciated.

My apologies in advance if I am not including any information that might be helpful. But please let me know if you need additional information.

My code:

data maps.&plan._UPD_UID_HEAD_PHM_XWalk;
attrib clm_head length=$20 label="Clm Head"
clm_head_uid_HASH length=8 label="Clm Header UID HASH"
map_source length=$6.
;
call missing(of _all_);
run;


options compress=no;
data clm_head
maps.&plan._UPD_UID_HEAD_PHM_XWalk (keep=map_source clm_head
clm_head_uid_HASH)
;

attrib clm_head_uid length=8 label="Clm Header UID"
clm_head_uid_HASH length=8 label="Clm Header UID HASH"
Clm_Head_Match_Flag length=$1.
last_used_uid length=8
n length=5;

set clm_ds1 end=eof;

retain last_used_uid %eval(&last_used_uid);

if _n_ = 1 then do;
declare hash hj(hashexp:32);
hj.definekey('map_source','clm_head');
hj.definedata
('map_source','clm_head','clm_head_UID_HASH');
hj.definedata('n');
hj.definedone();
end;

set &plan._CUR_UID_HEAD_PHM_XWalk
key=clm_head_dc / unique;

if _iorc_ = 0 then do;
Clm_Head_Match_Flag = 'Y';
end;
else do;
if clm_head=' ' then do;
Clm_Head_Match_Flag = 'Z';
clm_head_UID = .;
end;
else do;
Clm_Head_Match_Flag = 'N';

if hj.find() eq 0 then do;
set maps.&plan._UPD_UID_HEAD_PHM_XWalk
point=n ;
clm_head_UID = clm_head_UID_HASH;
clm_headx = clm_head;
Clm_Head_Match_Flag = 'A';
end;
else do;
clm_head_UID_HASH = last_used_uid+1;
clm_head_UID = clm_head_uid_HASH;
last_used_uid =
clm_head_UID_HASH;
n+1;

rc=hj.add();

output maps.&plan._UPD_UID_HEAD_PHM_XWalk;
/* if rc=0 then Clm_Head_Match_Flag = 'A';*/
end;
end;

_iorc_=0;
_error_=0;
end;

if eof then do;
hj.delete();
end;

output clm_head;
run;

Thank you in advance.

Best,
Ravi
Super Contributor
Super Contributor
Posts: 3,174

Re: SET and OUTPUT same dataset within Datastep - An Issue?

The answer to your fundamental question about "dynamic update" with a SAS dataset using OUTPUT and then "referencing that same SAS dataset dynamically in the same DATA step" is no. The SAS datasets mentioned on the DATA statement are not the same physical file that you are referencing with the SET statement, until the DATA step completes and the permanent copy (on the DATA statement) is replaced.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: SET and OUTPUT same dataset within Datastep - An Issue?

Thank you very much, Scott.
I appreciate your taking the time to respond.

Your answer makes total sense. I guess I was thinking wishfully!
Thanks again.

Ravi
Super Contributor
Super Contributor
Posts: 3,174

Re: SET and OUTPUT same dataset within Datastep - An Issue?

You will need to consider pre-processing your new input data file, creating a suitable "interim master file" for additional processing. Options to consider are a hash table, a PROC FORMAT with a PUT function look-up, and also as you demonstrated a SET with a KEY= approach to find a suitable match-condition.

Scott Barry
SBBWorks, Inc.
Valued Guide
Posts: 2,175

Re: SET and OUTPUT same dataset within Datastep - An Issue?

when you need to change a SAS data set with "update-in-place", look at using the MODIFY statement.
N/A
Posts: 0

Re: SET and OUTPUT same dataset within Datastep - An Issue?

Hi Scott, Peter:

Thank you for your thoughts.

I have updated my process to do something like what Scott suggested above. Basically, I moved the "if hj.find() = 0" do block to a different data step which follows the above one. Now my process works the way I intended it to.

But I will look into the Modify statement to see if I can use it, because then I can avoid reading the two datasets again. I need to study if direct accessing works with Modify and probably other things that may cause issues.

Regards,
Ravi
Super Contributor
Super Contributor
Posts: 3,174

Re: SET and OUTPUT same dataset within Datastep - An Issue?

From the SAS support website - a technical paper on the topic of MODIFY (somewhat dated, but still applicable to current SAS version):

TS-250
DATA Step Programming Using the MODIFY Statement
http://support.sas.com/techsup/technote/ts250.html


Obviously, you will want to ensure you maintain a solid backup / recovery strategy with your master database reference.

Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 6 replies
  • 145 views
  • 0 likes
  • 3 in conversation