BookmarkSubscribeRSS Feed
RichardDeVen
Barite | Level 11

It appears I was hasty in my speculation and incorrect in presuming non-persistence -- oh the dangers of assume.

 

Based on Art Carpenters 2018 paper Using Memory Resident Hash Tables to Manage Your Lookup Control Files  (thanks for the link @data_null__ ) I worked up some code that appears to clearly verify a declared hash variable (and it's initialization via dataset: keyword) is persisted between calls within the scope of a single step. This example uses a view to load the hash in order to Log a message when the view is processed, and that message appears only once.  This would indicated to me that internally the hash component is caching or singleton'ing declared instantiations.  This would mean methods such as .defineKey, .defineData and .defineDone will only do something, in regards to setting up the component, once, at execution points prior to a first .defineDone.

 

Tested in SAS 9.4M6

dm 'clear output';

data names / view=names;
  set sashelp.class;
  where name in: ('A' 'J');
  if _n_ = 1 then 
    putlog 'NOTE: ----------'                        /* this goes into the Log window */
         / 'NOTE: Names view being accessed'
         / 'NOTE: ----------' 
         / '00'x 
         ;
run;

proc fcmp outlib=work.sandbox.functions;
  function found(name $);

    declare hash h(dataset:'names');
    rc = h.defineKey('name');
    rc = h.defineDone();

    put 'function found(), checking ' name=;   /* this goes only to the OUTPUT window (not LOG, not ODS) */

    return (h.check() = 0);
  endsub;
quit;

options cmplib=work.sandbox;
proc print data=sashelp.class;                 /* this goes to the ODS destinations */
  where found(name);
run;

dm 'output' output;

 

Art's paper also talks about how FUTURE DEVELOPMENT should address the issue of keeping a hash table in memory across step boundaries.

 

---- Original post ----

 

The paper "Hashing in PROC FCMP to Enhance Your Productivity" Andrew Henrick, Donald Erdman, and Stacey Christian demonstrates how to use a hash in an FCMP function.

 

 

data names;
  set sashelp.class;
  where name in: ('A' 'J');
run;

proc fcmp outlib=work.sandbox.functions;
  function found(name $);
    declare hash h(dataset: "work.names");
    rc = h.defineKey("name");
    rc = h.defineDone();
    return (h.check() = 0);
  endsub;
quit;

options cmplib=work.sandbox;
proc print data=sashelp.class;
  where found(name);
run;

 

 

 

All well and good, except for the fact that the DECLARE statement in the function FOUND instantiates a hash and reads the table WORK.NAMES from disk into the hash EVERY TIME the function FOUND() is called from the where statement in the Proc PRINT.

 

Is there a setting or strategy for persisting the hash in memory between function calls ?  I would like to read the table one time, for the duration of the thread associated with the Proc PRINT.

 

This is a simple example, but the bigger application is exploring how to use hash, in an FCMP function, whose role is to act as a cache.  The data elements of the hash would be the result of a expensive i/o+cpu computation that would want to be avoided if the same set of inputs are passed to the function a second time.

6 REPLIES 6
ballardw
Super User

Are you aware of any Function in any programming language that persists data from previous calls?

 

 

 

ballardw
Super User

@Kurt_Bremser wrote:

In C, if you declare a variable as static.

All RAND functions in SAS need to do that.


So a VARIABLE is static. Not a Function.

I might be misunderstanding.

mkeintz
PROC Star

Edited note:  I take everything below back.  Memory does not increase using the hash-inside-fcmp when I increased the number of observations (by a factor of 100) sent to proc print.  So apparently the object IS being removed from memory in each "iteration" of the proc print.  It is apparently not persisting.

 

 

@RichardDeVen is not saying that the hash object doesn't persist - it's that successive function calls do not automatically recognize that the object is still in memory and presumably available.  Instead it honors the declare statement and makes a new one, with exactly the same name and content, i.e. a duplicate.

 

The fact that the object has the same name doesn't cause SAS to relinquish the memory taken up by the first instantiation.  (After all, that would defeat the ability to do hash-of-hashes).

 

So, no doubt that addition to the cpu and i/o expense of repeatedly reading in the data table, there will also be a giant memory cost as well.

 

Editted correction. Oops - just realized that this is for a proc, not a data step - so I presume _N_ is not available.  Is the automatic variable _N_ available within proc fcmp?  If so, and if you always use the function at _n_=1, then I guess you could use the same "if _n_=1 then do" technique of instantiating the hash object.   

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 897 views
  • 1 like
  • 5 in conversation