Solved: The memrc argument in the hash object definedone method

PeterClemmensen · Posted 01-30-2020 07:29 AM

Hi all.

Recently, I stumbled upon the Memrc Option the the hash object Definedone() Method. The documentation is sparse, but says that:

"If a call fails because of insufficient memory to load a data set, a nonzero return code is returned. The hash object frees the principal memory in the underlying array. The only allowable operation after this type of failure is deletion via the DELETE method."

So, if a data set is too big to fit in the hash object, the Definedone Method will simply return a nonzero value and will not terminate the data step. Like below.

data big;
   do x = 1 to 1e9;
      output;
   end;
run;

data _null_;

   declare hash h (dataset : 'big');
   h.definekey ('x');
   rc = h.definedone (memrc : 'y'); 

   put rc = ;

   x = .;

   /* More code ... */

run;

According to the doc, the only thing, I am allowed to do with h now is to call the Delete Method.

What the documentation does not mention is that this is true only for that instance of the hash object. I am allowed to create another instance of h and work with it as usual. Like below

data big;
   do x = 1 to 1e9;
      output;
   end;
run;

data small;
   do x = 1 to 10;
      output;
   end;
run;

data _null_;

   declare hash h (dataset : 'big');
   h.definekey ('x');
   rc = h.definedone (memrc : 'y');

   put rc =;

   h = _new_ hash (dataset:'small');
   h.definekey ('x');
   rc = h.definedone (memrc:'y');

   put rc =;

   x = 1;

   rc = h.find ();

run;

Do any of you have experience with the Memrc Option? Or has opinions on it? Or maybe you know something that the documentation missed?

I simply messed around with it on my own as I could hardly find it mentioned anywhere but the few lines in the documentation

I tagged a few hash people below, but anyone is more than welcome to chime in 🙂

@hashman @DonH @novinosrin @mkeintz

The DATA to DATA Step Macro
Blog: SASnrd

DonH · Posted 01-30-2020 08:31 AM

Thanks for the question and the tag draycut. A couple of thoughts on this (I have not tried your code yet).

First, @hashman and I discovered quite a few things in the documentation that were less than clear. So add this to the list.

Next, your observation that you could use the hash object later in your code is not surprising. The non-scalar object h in your example, is actually a pointer to a memory location. It is not a physical structure. So if the load of the data fails, and SAS properly cleaned up on that failure (more on this later), there is no reason why that hash object can't be re-used as long as the structure of the keys and data portion are the same (in your example, X is the only key and the only data item). There are numerous examples in the documentation and the book @hashman and I wrote (as well as here in the community) of such re-use after using the CLEAR method.

Given this and returning to the issue of the documentation, it would not surprise me if cleaning up after such a failed load was fixed in a later release and this nuance of the documentation was just missed.

And to respond to @novinosrin, there is not an explicit example of this in our book. So if there is a second edition, we could certainly include this. And WRT a Q&A, we can do that right here in the community (note that I tagged this reply using DMHashBook which is a tag @hashman and I use in articles we post here).

The issue of reusing hash object pointers is covered in a few places in the book - most notably in the memory management chapter. And creating multiple instances of hash object in the hash of hash chapter.

Hope this helps.

View solution in original post

novinosrin · Posted 01-30-2020 07:58 AM

Good morning @PeterClemmensen Thank you for the mention and so pleasing and needless to mention your kindness and altruism is obvious. This piece is something I haven't really tested nor I know much about it. Since, I am not a documentation believer and have been privileged to have a paid subscription on O'Reilly/Safari media, I tend to use, search and find stuff in the books. Consequent to which, I just checked a moment ago and it appears Guru @hashman / @DonH haven't covered that piece in the book yet and perhaps that is for the next edition.

I will look forward to Guru PD/DH chiming in on the topic and will take notes like any other audience. Thank you for bringing such interesting questions. Have a great day! Ciao!

PS The 2nd edition(if yes) hopefully will also have Q&A as my college mates/Professors of BI/Stats department at DePaul really were asking for it.

yabwon · Posted 01-30-2020 09:21 AM

The 2nd edition of the HashBook - I second that!

all the best
Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

hashman · Posted 01-30-2020 11:37 AM

@yabwon :

With your fabulous errata sheet taken in to account!

Kind regards

Paul D.

DonH · Posted 01-30-2020 08:31 AM

Thanks for the question and the tag draycut. A couple of thoughts on this (I have not tried your code yet).

First, @hashman and I discovered quite a few things in the documentation that were less than clear. So add this to the list.

Next, your observation that you could use the hash object later in your code is not surprising. The non-scalar object h in your example, is actually a pointer to a memory location. It is not a physical structure. So if the load of the data fails, and SAS properly cleaned up on that failure (more on this later), there is no reason why that hash object can't be re-used as long as the structure of the keys and data portion are the same (in your example, X is the only key and the only data item). There are numerous examples in the documentation and the book @hashman and I wrote (as well as here in the community) of such re-use after using the CLEAR method.

Given this and returning to the issue of the documentation, it would not surprise me if cleaning up after such a failed load was fixed in a later release and this nuance of the documentation was just missed.

And to respond to @novinosrin, there is not an explicit example of this in our book. So if there is a second edition, we could certainly include this. And WRT a Q&A, we can do that right here in the community (note that I tagged this reply using DMHashBook which is a tag @hashman and I use in articles we post here).

The issue of reusing hash object pointers is covered in a few places in the book - most notably in the memory management chapter. And creating multiple instances of hash object in the hash of hash chapter.

Hope this helps.

hashman · Posted 01-30-2020 11:35 AM

@PeterClemmensen:

Thanks for drawing attention to this angle. @DonH and I had discussed whether to include MEMRC in the book (it was in "my" chapter and I originally did) but decided against it weighing on how to reduce the over-the-limit page count.

This argument tag gives the programmer the option to continue processing if a hash table is filled beyond the system memory limit rather than abending the step - and the program. Conceivably, it can be used to decide programmatically to resort to a different method of processing should the hash step overfill the memory. Sort of like "if the hash step should overfill the memory, detect the condition and use a different piece of code - e.g. some divide-and-conquer methodology - to process the data in a different manner to attain the same goal".

One problem with this approach is under certain scenarios, the hash step can run for hours before running out of hash memory - as @DonH and I have found the hard way in the real data processing world while (ab)using the hash object for data aggregation. When one aggregate was done with, the table would be voided via CLEAR and proceed to the next aggregate, and it could turn out that the aggregate that would overfill the table beyond the memory limits would be one of the last after hours of the job running.

However, methinks it's better to have MEMRC just in case rather than not having it at all, however rare and exotic its usage might be.

Kind regards

Paul D.

DonH · Posted 01-30-2020 12:01 PM

Thanks @hashman for the reminder about that having to get pulled. I’ll find it and post it as article.

mkeintz · Posted 01-30-2020 10:42 AM

@PeterClemmensen Interesting question. I've never had reason to learn about the memrc argument.

And it made me wonder whether a non-zero return from the definedone() method would release the otherwise useless memory.

Answer:it does.

I ran the following program while monitoring memory use by SAS on my windows machine (a technique I recently learned from @hashman in the SAS-L listserve group. It showed memory for SAS bumping up from 64MB to 1400MB. But when the definedone method returned a 160030, it dropped back to 64MB long before the data step ended.

data big /view=big;
  i=1; 
  length ch1-ch20 $1000;
  array _ch ch: ;
  do over _ch;   _ch=repeat('x',999); end;
  do i=1 to 1e6; output; end;
run;
   
data _null_;
   if 0 then set big;
   call sleep(5,1);
   declare hash h (dataset:'big');
      h.definekey('i');
      h.definedata(all:'Y');
      rc=h.definedone(memrc:'y');
   put rc=;
   call sleep(5,1);
   stop;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hashman · Posted 01-30-2020 11:45 AM

@mkeintz:

Mark, whenever I had to resort to this SLEEP+Windows Task Manager subterfuge, it always made me internally swear at not having a SAS function that would return the amount of memory being currently used by the step or at least the SAS session. Methiinks it wouldn't be too hard to implement in the underlying software.

Kind regards

Paul D.

PeterClemmensen · Posted 01-31-2020 01:46 PM

Thank you all for chiming in. Makes more sense now. Though, I am not sure I will ever need the option in a real-world situation, it is nice to know it is there.

I can only accept one of the answers as a solution. I picked @DonHs because I found it well-crafted. And because he mentioned a second edition of his and @hashmans book. Which I look very much forward to 🙂

Again, thank you.

The DATA to DATA Step Macro
Blog: SASnrd

The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Re: The memrc argument in the hash object definedone method

Classroom Training Available!