BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello,

I am trying to apply some code presented by Kenneth W. Borowiak at NESUG 2006 to see if I can use HASH more efficiently to avoid a memory issue. As Kenneth mentions in his paper, the idea was presented by Paul Dorfman and Lessia Shajenko in a 2006 paper.

Both these papers assume that the data for the hash is available for loading into the hash - something like the below (taken from Kenneth's
paper):

do until(eof_SubsetMe) ;
set SubsetMe end=eof_SubsetMe ;
n+1 ;
Sub.add() ;
end;

In my case, I dont have the data beforehand. I want my hash table to be loaded whenever the data is not found in the hash. My modified code is pasted below. When I run the code, it seems to get the value of 'n'
correctly, but it seems to always point to the last record in maps.&plan._UPD_UID_HEAD_PHM_XWalk.

Could there be a conflict between the datastep pointer and the direct access pointer? Once the "maps.&plan._UPD_UID_HEAD_PHM_XWalk" dataset is set, cannot it be updated (by the output statement)? And if yes, will it be re "SET"? I feel I am missing something fundamental about how the datastep works. Any help here is greatly appreciated.

My apologies in advance if I am not including any information that might be helpful. But please let me know if you need additional information.

My code:

data maps.&plan._UPD_UID_HEAD_PHM_XWalk;
attrib clm_head length=$20 label="Clm Head"
clm_head_uid_HASH length=8 label="Clm Header UID HASH"
map_source length=$6.
;
call missing(of _all_);
run;


options compress=no;
data clm_head
maps.&plan._UPD_UID_HEAD_PHM_XWalk (keep=map_source clm_head
clm_head_uid_HASH)
;

attrib clm_head_uid length=8 label="Clm Header UID"
clm_head_uid_HASH length=8 label="Clm Header UID HASH"
Clm_Head_Match_Flag length=$1.
last_used_uid length=8
n length=5;

set clm_ds1 end=eof;

retain last_used_uid %eval(&last_used_uid);

if _n_ = 1 then do;
declare hash hj(hashexp:32);
hj.definekey('map_source','clm_head');
hj.definedata
('map_source','clm_head','clm_head_UID_HASH');
hj.definedata('n');
hj.definedone();
end;

set &plan._CUR_UID_HEAD_PHM_XWalk
key=clm_head_dc / unique;

if _iorc_ = 0 then do;
Clm_Head_Match_Flag = 'Y';
end;
else do;
if clm_head=' ' then do;
Clm_Head_Match_Flag = 'Z';
clm_head_UID = .;
end;
else do;
Clm_Head_Match_Flag = 'N';

if hj.find() eq 0 then do;
set maps.&plan._UPD_UID_HEAD_PHM_XWalk
point=n ;
clm_head_UID = clm_head_UID_HASH;
clm_headx = clm_head;
Clm_Head_Match_Flag = 'A';
end;
else do;
clm_head_UID_HASH = last_used_uid+1;
clm_head_UID = clm_head_uid_HASH;
last_used_uid =
clm_head_UID_HASH;
n+1;

rc=hj.add();

output maps.&plan._UPD_UID_HEAD_PHM_XWalk;
/* if rc=0 then Clm_Head_Match_Flag = 'A';*/
end;
end;

_iorc_=0;
_error_=0;
end;

if eof then do;
hj.delete();
end;

output clm_head;
run;

Thank you in advance.

Best,
Ravi
6 REPLIES 6
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
The answer to your fundamental question about "dynamic update" with a SAS dataset using OUTPUT and then "referencing that same SAS dataset dynamically in the same DATA step" is no. The SAS datasets mentioned on the DATA statement are not the same physical file that you are referencing with the SET statement, until the DATA step completes and the permanent copy (on the DATA statement) is replaced.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Thank you very much, Scott.
I appreciate your taking the time to respond.

Your answer makes total sense. I guess I was thinking wishfully!
Thanks again.

Ravi
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
You will need to consider pre-processing your new input data file, creating a suitable "interim master file" for additional processing. Options to consider are a hash table, a PROC FORMAT with a PUT function look-up, and also as you demonstrated a SET with a KEY= approach to find a suitable match-condition.

Scott Barry
SBBWorks, Inc.
Peter_C
Rhodochrosite | Level 12
when you need to change a SAS data set with "update-in-place", look at using the MODIFY statement.
deleted_user
Not applicable
Hi Scott, Peter:

Thank you for your thoughts.

I have updated my process to do something like what Scott suggested above. Basically, I moved the "if hj.find() = 0" do block to a different data step which follows the above one. Now my process works the way I intended it to.

But I will look into the Modify statement to see if I can use it, because then I can avoid reading the two datasets again. I need to study if direct accessing works with Modify and probably other things that may cause issues.

Regards,
Ravi
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
From the SAS support website - a technical paper on the topic of MODIFY (somewhat dated, but still applicable to current SAS version):

TS-250
DATA Step Programming Using the MODIFY Statement
http://support.sas.com/techsup/technote/ts250.html


Obviously, you will want to ensure you maintain a solid backup / recovery strategy with your master database reference.

Scott Barry
SBBWorks, Inc.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 755 views
  • 0 likes
  • 3 in conversation