Hello,
I encounter a problem with hash programmation and call execute.
I have a parameter table, I read it with a data step and for each line I use a call execute. In the call execute there are datastep with select/when statement based on a column of the parameter table and some of the when statement declare hash table. Below How my code is structured( it's a simplification):
%macro my_macro(rule,param1,param2); data _null_; select(&rule); when("Something 1") do;
if (a condition) then do; dcl hash h1("dataset: <a table with 1 million rows>", hashexp:10); h1.declarekey(...); h1.definedone(); dcl hash h2("dataset: <a table with 1 million rows>", hashexp:10); h2.declarekey(...); h2.definedone(); /* <more code>*/
end; end; when ("Something2") do;
if ( a condition ) then do; dcl hash h3("dataset: <a table with 1 million rows>", hexp:10); h3.declarekey(...); h3.definedone();
end; /* <more code>*/ end; end; run; %mend my_macro; data _null_; set param_table; call execute('%my_macro(rule=' || strip(rule) || ', param1=' || strip(col2) || ', param2=' || col3 ); run;
Everything is good untill last line of my parameter file, where i encounter a 'Memory faillure'.
I can't change the sasv9 config, Memsize is set to the maxixmum (0 to let SAS decide how much memory it need).
As a hash object as a life duration only on the datastep, my question is which datastep keep hash in memory in this case ? the first which called the call execute or the second which is ran in the call execute ?
Or maybe all hash table are created without test of condition (normally not).
thanks in advance for the help.
You re-declare the hash every time a condition is met. If you do not take care to completely remove the declaration when you're finished using it, "dead" hash objects will accumulate in memory.
Declare hash objects once (at _n_ = 1), and clear their contents as needed (using the CLEAR() method) if you need to fill them with different content. But from what I see, you just load a table, so you need to do that only once, and only use the FIND() or CHECK() methods later.
You re-declare the hash every time a condition is met. If you do not take care to completely remove the declaration when you're finished using it, "dead" hash objects will accumulate in memory.
Declare hash objects once (at _n_ = 1), and clear their contents as needed (using the CLEAR() method) if you need to fill them with different content. But from what I see, you just load a table, so you need to do that only once, and only use the FIND() or CHECK() methods later.
Hi,
after rewriting your example to "runable" code:
data a_table_with_1_million_rows;
do key1 = 1 to 1e6;
key2 = key;
key3 = key;
output;
end;
run;
data param_table;
col2 = "A"; col3 = "B";
rule = "Something 1"; output;
rule = "Something 2"; output;
run;
%macro my_macro(rule,param1,param2);
data _null_;
if 0 then set a_table_with_1_million_rows;
select(&rule);
when("Something 1")
do;
if (_N_ = 1) then do;
dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10);
h1.defineKey("key1", "key2", "key3");
h1.defineDone();
dcl hash h2(dataset: "a_table_with_1_million_rows", hashexp:10);
h2.defineKey("key1", "key2", "key3");
h2.defineDone();
/* <more code>*/
end;
end;
when ("Something 2")
do;
if ( _N_ = 1 ) then do;
dcl hash h3(dataset: "a_table_with_1_million_rows", hashexp:10);
h3.defineKey("key1", "key2", "key3");
h3.defineDone();
end;
/* <more code>*/
end;
otherwise;
end;
stop;
run;
%mend my_macro;
data _null_;
set param_table;
call execute('%my_macro(rule="' || strip(rule) || '", param1=' || strip(col2) || ', param2=' || col3 || ")");
run;
Mind I've added "_N_ = 1" to the IF condition, you can see that for each line of `param_table` different number of datasets is loaded into hashes, first 2 two hashes, second 1 one hash:
NOTE: There were 2 observations read from the data set WORK.PARAM_TABLE. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 454.90k OS Memory 16884.00k NOTE: CALL EXECUTE generated line. 1 + data _null_; if 0 then set a_table_with_1_million_rows; select("Something 1"); when("Something 1") do; if (_N_ = 1) then do; dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10); h1.defineKey("key1", 2 + "key2", "key3"); h1.defineDone(); dcl hash h2(dataset: "a_table_with_1_million_rows", hashexp:10); h2.defineKey("key1", "key2", "key3"); h2.defineDone(); end; end; when ("Something 2" 3 +) do; if ( _N_ = 1 ) then do; dcl hash h3(dataset: "a_table_with_1_million_rows", hashexp:10); h3.defineKey("key1", "key2", "key3"); h3.defineDone(); end; end; otherwise; end; 4 + stop; run; NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS. NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS. NOTE: DATA statement used (Total process time): real time 0.66 seconds user cpu time 0.54 seconds system cpu time 0.10 seconds memory 164777.87k OS Memory 180276.00k 5 + data _null_; if 0 then set a_table_with_1_million_rows; select("Something 2"); when("Something 1") do; if (_N_ = 1) then do; dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10); h1.defineKey("key1", 6 + "key2", "key3"); h1.defineDone(); dcl hash h2(dataset: "a_table_with_1_million_rows", hashexp:10); h2.defineKey("key1", "key2", "key3"); h2.defineDone(); end; end; when ("Something 2" 7 +) do; if ( _N_ = 1 ) then do; dcl hash h3(dataset: "a_table_with_1_million_rows", hashexp:10); h3.defineKey("key1", "key2", "key3"); h3.defineDone(); end; end; otherwise; end; 8 + stop; run; NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS. NOTE: DATA statement used (Total process time): real time 0.33 seconds user cpu time 0.29 seconds system cpu time 0.04 seconds memory 82728.40k OS Memory 98580.00k
The "a_table_with_1_million_rows" in my example has only 3 variables in my case and is ~23MB in size, for the first observation of `param_table` it uses 164777.87k of memory in the first datastep executed by call execute() and half of this size for the second observation (and the second datastep executed by call execute). So answer to your question is: each call execute generates separate datastep with separate hashtables using RAM separately.
Try to run your code with:
options fullstimer msglivel = i ;
turned on to see what is your memory use in each.
All the best
Bart
As stated by others, you are instantiating and loading a hash for every row that meets a select criteria. The new instantiations overwrite the hash reference and thus any memory associated with the prior instance exists but is unreachable. Congratulations, you have created a memory leak!
Your code indicates you want to only load hash data when it is needed according to some select condition (lets call this dynamic hash instance loading), versus pre-loading all possibly needed hash data (static hash instance loading).
Going down the dynamic path...
You can declare a hash in a non-executable statement
declare hash h;
But there is no way to test if h 'null', which would mean there is no hash instance associated with it.
Any attempt to use a hash method will log errors
ERROR: Uninitialized object at line #### column ##. ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
You can declare an empty hash instance, but instantiation is an executable operation and needs to be guarded against repeated execution using an _n_=1 block.
NUM_ITEMS=0 logic will tell you the hash has not been loaded (dynamically) yet. If your hash data loading data set has 0 rows, that is another problem for you to deal with.
if _n_ = 1 then do; * declare an 'empty' instance once but don't load. num_items can be used and will return 0. * This takes up a tiny amount of memory; declare hash h(); end;
The dynamic loading of a hash will create a new instance and populate it.
when (<condition>) do; if h.num_items=0 then do; * dynamic loading per condition; * load once; h = _new_ hash (dataset:'<tablename>'); h.defineKey('<key-column>'); h.defineData('<data-column>'); h.defineDone(); end; ... ... some code with a h.<hash-method>() ...
Example code with three hash objects dynamically loaded
data lookup1; do key = 1 to 10; value = key**2; output; end; run; data lookup2; do key = 1 to 10; value = key**1/2; output; end; run; data lookup3; do key = 1 to 10; value = 1e6+key; output; end; run; data have; input lookup_table key @@; datalines; 1 2 3 4 1 3 2 4 5 1 1 1 2 2 3 3 3 10 ; data want; if _n_ = 1 then do; * declare an 'empty' instance once but don't load. num_items will be 0. * This takes up a tiny amount of memory; declare hash lookup1(); declare hash lookup2(); declare hash lookup3(); end; set have; select (lookup_table); when (1) do; if lookup1.num_items=0 then do; * load (once) hash data on requirement demand (the when condition); lookup1 = _new_ hash (dataset:'lookup1'); lookup1.defineKey('key'); lookup1.defineData('value'); lookup1.defineDone(); end; value = lookup1.find(); end; when (2) do; if lookup2.num_items=0 then do; * load once; lookup2 = _new_ hash (dataset:'lookup2'); lookup2.defineKey('key'); lookup2.defineData('value'); lookup2.defineDone(); end; value = lookup2.find(); end; when (3) do; if lookup3.num_items=0 then do; * load once; lookup3 = _new_ hash (dataset:'lookup3'); lookup3.defineKey('key'); lookup3.defineData('value'); lookup3.defineDone(); end; value = lookup3.find(); end; otherwise value = .; end; run;
Hello, thanks to all for your responses and clarification. It was very helpful !
So in my code, I added the close " if _n_=1 " before declare hash table and it works perfectly now.
Regards.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.