BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
willi_m
Calcite | Level 5

Hello, 

 

I encounter a problem with hash programmation and call execute.

I have a parameter table, I read it with a data step and for each line I use a call execute. In the call execute there are datastep with select/when statement based on a column of the parameter table and some of the when statement declare hash table. Below How my code is structured( it's a simplification):

 

%macro my_macro(rule,param1,param2);

data _null_;
  select(&rule);
      when("Something 1") do;
if (a condition) then do; dcl hash h1("dataset: <a table with 1 million rows>", hashexp:10); h1.declarekey(...); h1.definedone(); dcl hash h2("dataset: <a table with 1 million rows>", hashexp:10); h2.declarekey(...); h2.definedone(); /* <more code>*/

end; end; when ("Something2") do;
if ( a condition ) then do; dcl hash h3("dataset: <a table with 1 million rows>", hexp:10); h3.declarekey(...); h3.definedone();
end; /* <more code>*/ end; end; run; %mend my_macro; data _null_; set param_table; call execute('%my_macro(rule=' || strip(rule) || ', param1=' || strip(col2) || ', param2=' || col3 ); run;

Everything is good untill last line of my parameter file, where i encounter a 'Memory faillure'.

I can't change the sasv9 config, Memsize is set to the maxixmum (0 to let SAS decide how much memory it need).

 

As a hash object as a life duration only on the datastep, my question is which datastep keep hash in memory in this case ? the first which called the call execute or the second which is ran in the call execute ? 

 

Or maybe all hash table are created without test of condition (normally not).

 

thanks in advance for the help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You re-declare the hash every time a condition is met. If you do not take care to completely remove the declaration when you're finished using it, "dead" hash objects will accumulate in memory.

Declare hash objects once (at _n_ = 1), and clear their contents as needed (using the CLEAR() method) if you need to fill them with different content. But from what I see, you just load a table, so you need to do that only once, and only use the FIND() or CHECK() methods later.

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

You re-declare the hash every time a condition is met. If you do not take care to completely remove the declaration when you're finished using it, "dead" hash objects will accumulate in memory.

Declare hash objects once (at _n_ = 1), and clear their contents as needed (using the CLEAR() method) if you need to fill them with different content. But from what I see, you just load a table, so you need to do that only once, and only use the FIND() or CHECK() methods later.

yabwon
Onyx | Level 15

Hi,

 

after rewriting your example to "runable" code:

data a_table_with_1_million_rows;
  do key1 = 1 to 1e6;
     key2 = key;
     key3 = key;
     output;
  end;
run;

data param_table;
  col2 = "A"; col3 = "B";
  rule = "Something 1"; output;
  rule = "Something 2"; output;
run;


%macro my_macro(rule,param1,param2);

data _null_;

  if 0 then set a_table_with_1_million_rows;

  select(&rule);
    when("Something 1") 
      do;
        if (_N_ = 1) then do;
          dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10);
          h1.defineKey("key1", "key2", "key3");
          h1.defineDone();

          dcl hash h2(dataset: "a_table_with_1_million_rows", hashexp:10);
          h2.defineKey("key1", "key2", "key3");
          h2.defineDone();

          /* <more code>*/
        end;
      end;
    when ("Something 2") 
      do;
        if ( _N_ = 1 ) then do;
          dcl hash h3(dataset: "a_table_with_1_million_rows", hashexp:10);
          h3.defineKey("key1", "key2", "key3");
          h3.defineDone();
        end;

        /* <more code>*/
      end;
    otherwise;
  end;

  stop;
run;

%mend my_macro;

data _null_;
    set param_table;
    call execute('%my_macro(rule="' || strip(rule) || '", param1=' || strip(col2) || ', param2=' || col3 || ")");
run;

Mind I've added "_N_ = 1" to the IF condition, you can see that for each line of `param_table` different number of datasets is loaded into hashes, first 2 two hashes, second 1 one hash:

 

NOTE: There were 2 observations read from the data set WORK.PARAM_TABLE.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              454.90k
      OS Memory           16884.00k
  

NOTE: CALL EXECUTE generated line.
1   + data _null_;    if 0 then set a_table_with_1_million_rows;    select("Something 1");     when("Something 1")
do;         if (_N_ = 1) then do;           dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10);
h1.defineKey("key1",
2   + "key2", "key3");           h1.defineDone();            dcl hash h2(dataset: "a_table_with_1_million_rows",
hashexp:10);           h2.defineKey("key1", "key2", "key3");           h2.defineDone();                     end;
end;     when ("Something 2"
3   +)       do;         if ( _N_ = 1 ) then do;           dcl hash h3(dataset: "a_table_with_1_million_rows",
hashexp:10);           h3.defineKey("key1", "key2", "key3");           h3.defineDone();         end;                 end;
   otherwise;   end;
4   + stop; run;

NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS.
NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS.
NOTE: DATA statement used (Total process time):
      real time           0.66 seconds
      user cpu time       0.54 seconds
      system cpu time     0.10 seconds
      memory              164777.87k
      OS Memory           180276.00k
  


5   + data _null_;    if 0 then set a_table_with_1_million_rows;    select("Something 2");     when("Something 1")
do;         if (_N_ = 1) then do;           dcl hash h1(dataset: "a_table_with_1_million_rows", hashexp:10);
h1.defineKey("key1",
6   + "key2", "key3");           h1.defineDone();            dcl hash h2(dataset: "a_table_with_1_million_rows",
hashexp:10);           h2.defineKey("key1", "key2", "key3");           h2.defineDone();                     end;
end;     when ("Something 2"
7   +)       do;         if ( _N_ = 1 ) then do;           dcl hash h3(dataset: "a_table_with_1_million_rows",
hashexp:10);           h3.defineKey("key1", "key2", "key3");           h3.defineDone();         end;                 end;
   otherwise;   end;
8   + stop; run;

NOTE: There were 1000000 observations read from the data set WORK.A_TABLE_WITH_1_MILLION_ROWS.
NOTE: DATA statement used (Total process time):
      real time           0.33 seconds
      user cpu time       0.29 seconds
      system cpu time     0.04 seconds
      memory              82728.40k
      OS Memory           98580.00k
  

The "a_table_with_1_million_rows" in my example has only 3 variables in my case and is ~23MB in size, for the first observation of `param_table` it uses 164777.87k of memory in the first datastep executed by call execute() and half of this size for the second observation (and the second datastep executed by call execute). So answer to your question is: each call execute generates separate datastep with separate hashtables using RAM separately.

 

Try to run your code with:

options fullstimer msglivel = i ;

turned on to see what is your memory use in each.

 

All the best

Bart

 

 

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



RichardDeVen
Barite | Level 11

As stated by others, you are instantiating and loading a hash for every row that meets a select criteria.  The new instantiations overwrite the hash reference and thus any memory associated with the prior instance exists but is unreachable.  Congratulations, you have created a memory leak!

 

Your code indicates you want to only load hash data when it is needed according to some select condition (lets call this dynamic hash instance loading), versus pre-loading all possibly needed hash data (static hash instance loading).

 

Going down the dynamic path...

 

You can declare a hash in a non-executable statement

 

declare hash h;

But there is no way to test if h 'null', which would mean there is no hash instance associated with it.

 

Any attempt to use a hash method will log errors

 

ERROR: Uninitialized object at line #### column ##.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.

You can declare an empty hash instance, but instantiation is an executable operation and needs to be guarded against repeated execution using an _n_=1 block.  

 

NUM_ITEMS=0 logic will tell you the hash has not been loaded (dynamically) yet.  If your hash data loading data set has 0 rows, that is another problem for you to deal with.

 

  if _n_ = 1 then do;
    * declare an 'empty' instance once but don't load.  num_items can be used and will return 0.
    * This takes up a tiny amount of memory;
    declare hash h();
  end;

The dynamic loading of a hash will create a new instance and populate it.

 

 

when (<condition>) do;
                if h.num_items=0 then do;  * dynamic loading per condition;
                  * load once;
                  h = _new_ hash (dataset:'<tablename>');
                  h.defineKey('<key-column>');
                  h.defineData('<data-column>');
                  h.defineDone();
                end;
... 
                ... some code with a h.<hash-method>() ...

 

Example code with three hash objects dynamically loaded

data lookup1;
  do key = 1 to 10;
    value = key**2;
    output;
  end;
run;

data lookup2;
  do key = 1 to 10;
    value = key**1/2;
    output;
  end;
run;

data lookup3;
  do key = 1 to 10;
    value = 1e6+key;
    output;
  end;
run;

data have;
  input lookup_table key @@;
datalines;
1 2 3 4 1 3 2 4 5 1 1 1 2 2 3 3 3 10
;

data want;
  if _n_ = 1 then do;
    * declare an 'empty' instance once but don't load.  num_items will be 0.
    * This takes up a tiny amount of memory;
    declare hash lookup1();
    declare hash lookup2();
    declare hash lookup3();
  end;

  set have;

  select (lookup_table);
    when (1)  do;
               if lookup1.num_items=0 then do;
                  * load (once) hash data on requirement demand (the when condition);
                  lookup1 = _new_ hash (dataset:'lookup1');
                  lookup1.defineKey('key');
                  lookup1.defineData('value');
                  lookup1.defineDone();
                end;
                value = lookup1.find();
              end;
    when (2)  do;
                if lookup2.num_items=0 then do;
                  * load once;
                  lookup2 = _new_ hash (dataset:'lookup2');
                  lookup2.defineKey('key');
                  lookup2.defineData('value');
                  lookup2.defineDone();
                end;
                value = lookup2.find();
              end;
    when (3)  do;
                if lookup3.num_items=0 then do;
                  * load once;
                  lookup3 = _new_ hash (dataset:'lookup3');
                  lookup3.defineKey('key');
                  lookup3.defineData('value');
                  lookup3.defineDone();
                end;
                value = lookup3.find();
              end;
    otherwise value = .;
  end;
run;

 

willi_m
Calcite | Level 5

Hello, thanks to all for your responses and clarification. It was very helpful !

 

So in my code, I added the close   " if _n_=1 "  before declare hash table and it works perfectly now.

 

Regards.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 925 views
  • 4 likes
  • 4 in conversation