Is there a way to persist a hash in an FCMP function between function ...

RichardDeVen · Posted 12-04-2020 08:38 AM

It appears I was hasty in my speculation and incorrect in presuming non-persistence -- oh the dangers of assume.

Based on Art Carpenters 2018 paper Using Memory Resident Hash Tables to Manage Your Lookup Control Files (thanks for the link @data_null__ ) I worked up some code that appears to clearly verify a declared hash variable (and it's initialization via dataset: keyword) is persisted between calls within the scope of a single step. This example uses a view to load the hash in order to Log a message when the view is processed, and that message appears only once. This would indicated to me that internally the hash component is caching or singleton'ing declared instantiations. This would mean methods such as .defineKey, .defineData and .defineDone will only do something, in regards to setting up the component, once, at execution points prior to a first .defineDone.

Tested in SAS 9.4M6

dm 'clear output';

data names / view=names;
  set sashelp.class;
  where name in: ('A' 'J');
  if _n_ = 1 then 
    putlog 'NOTE: ----------'                        /* this goes into the Log window */
         / 'NOTE: Names view being accessed'
         / 'NOTE: ----------' 
         / '00'x 
         ;
run;

proc fcmp outlib=work.sandbox.functions;
  function found(name $);

    declare hash h(dataset:'names');
    rc = h.defineKey('name');
    rc = h.defineDone();

    put 'function found(), checking ' name=;   /* this goes only to the OUTPUT window (not LOG, not ODS) */

    return (h.check() = 0);
  endsub;
quit;

options cmplib=work.sandbox;
proc print data=sashelp.class;                 /* this goes to the ODS destinations */
  where found(name);
run;

dm 'output' output;

Art's paper also talks about how FUTURE DEVELOPMENT should address the issue of keeping a hash table in memory across step boundaries.

---- Original post ----

The paper "Hashing in PROC FCMP to Enhance Your Productivity" _{Andrew Henrick, Donald Erdman, and Stacey Christian} demonstrates how to use a hash in an FCMP function.

data names;
  set sashelp.class;
  where name in: ('A' 'J');
run;

proc fcmp outlib=work.sandbox.functions;
  function found(name $);
    declare hash h(dataset: "work.names");
    rc = h.defineKey("name");
    rc = h.defineDone();
    return (h.check() = 0);
  endsub;
quit;

options cmplib=work.sandbox;
proc print data=sashelp.class;
  where found(name);
run;

All well and good, except for the fact that the DECLARE statement in the function FOUND instantiates a hash and reads the table WORK.NAMES from disk into the hash EVERY TIME the function FOUND() is called from the where statement in the Proc PRINT.

Is there a setting or strategy for persisting the hash in memory between function calls ? I would like to read the table one time, for the duration of the thread associated with the Proc PRINT.

This is a simple example, but the bigger application is exploring how to use hash, in an FCMP function, whose role is to act as a cache. The data elements of the hash would be the result of a expensive i/o+cpu computation that would want to be avoided if the same set of inputs are passed to the function a second time.

data_null__ · Posted 12-04-2020 10:04 AM

This paper might be helpful.

Using Memory Resident Hash Tables to Manage Your Lookup Control Files

ballardw · Posted 12-04-2020 10:09 AM

Are you aware of any Function in any programming language that persists data from previous calls?

Kurt_Bremser · Posted 12-04-2020 10:23 AM

In C, if you declare a variable as static.

All RAND functions in SAS need to do that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 12-04-2020 10:44 AM

@Kurt_Bremser wrote:

In C, if you declare a variable as static.

All RAND functions in SAS need to do that.

So a VARIABLE is static. Not a Function.

I might be misunderstanding.

Kurt_Bremser · Posted 12-04-2020 12:33 PM

If a static variable is defined within a function definition, it is local to the function, but will persist from call to call.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 12-04-2020 03:03 PM

Edited note: I take everything below back. Memory does not increase using the hash-inside-fcmp when I increased the number of observations (by a factor of 100) sent to proc print. So apparently the object IS being removed from memory in each "iteration" of the proc print. It is apparently not persisting.

@RichardDeVen is not saying that the hash object doesn't persist - it's that successive function calls do not automatically recognize that the object is still in memory and presumably available. Instead it honors the declare statement and makes a new one, with exactly the same name and content, i.e. a duplicate.

The fact that the object has the same name doesn't cause SAS to relinquish the memory taken up by the first instantiation. (After all, that would defeat the ability to do hash-of-hashes).

So, no doubt that addition to the cpu and i/o expense of repeatedly reading in the data table, there will also be a giant memory cost as well.

Editted correction. Oops - just realized that this is for a proc, not a data step - so I presume _N_ is not available. Is the automatic variable _N_ available within proc fcmp? If so, and if you always use the function at _n_=1, then I guess you could use the same "if _n_=1 then do" technique of instantiating the hash object.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

Re: Is there a way to persist a hash in an FCMP function between function calls ?

SAS Innovate 2025: Call for Content

Classroom Training Available!