Solved: Re: Hash object with Insufficient memory to execute data step program

DeepakSwain · Posted 02-27-2018 09:49 AM

Hi there,

In order to get the benefit of hash object programming for look up, I am trying to get a subset of a data having 3 million records based on the records with unique identifier present in another table with 1.3 million records. While doing so, getting an error message :

Insufficient memory to data step program. The SAS system stopped processing this step because of insufficient memory.

Can anybody suggest me to overcome the error or an alternative means to improve efficiency.

Thank you in advance for your kind reply.

Regards,

Swain

mkeintz · Posted 02-27-2018 02:01 PM

You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1. Your program assumes there is only 1 record per ID in table1. It makes no assumption about table 2.

You ran into a memory problem storing table 1 into a hash object. There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.

Increase memory
On my windows machine, if I run "proc options group=memory;run;" as suggested by others, it shows "memsize=2147483648" (2 gigabytes). You could start your sas session with more memory. For instance, if I click on the Start button and type
"sas -memsize 4g"
I have double the memory.
Decreasing hash memory demands.
1. If table2 requires less memory than table1 and has only 1 observations per ID, then reverse the roles of table 1 and table 2. How do you determine the memory required? Load a single record into the object (termed a single "data item" in hash lexicon), get the itemsize attribute ("bytes_per_item=hh.itemsize;") and multiply by the number of unique ID's.
  
  By the way, I would not ADD one record at a time. Instead use this declare statement to load it all at once:
```
declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
  hh.definekey ('ID');
  hh.definedata(all:'Y');
  hh.definedone();
```
2. Still too much memory needed? You could then use the hash object merely as a pointer to a record id that could then be used in a "set .... POINT= " statement, as here (untested).
```
data rid /view=rid;  /* Make pointers - record IDs*/
   set table1 (keep=id );
   rid=_n_;
run;

data want;
   if _n_=1 then declare hash hh (dataset:'rid');
      hh.definekey('id');
      hh.definedata(all:'Y');
      hh.definedone();
   end;
   set table2;
   rc=hh.find();
   if rc=0;
   set table1 (keep=id abstract behavior ......) point=rid;
run;      
```

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PeterClemmensen · Posted 02-27-2018 09:51 AM

Show us your code

The DATA to DATA Step Macro
Blog: SASnrd

DeepakSwain · Posted 02-27-2018 01:03 PM

data test1.table3 (drop=rc);
length Abstract $32767 Behavior $1 Comment $32767 Grade $1 Histology $4 Laterality $1 Site $4 TEST $20 ;

declare hash hh();
rc=hh.definekey('ID');
rc=hh.definedata('AbsrefID', 'Abstract', 'Behavior', 'Comment', 'DATEUPDATED', 'Grade', 'Histology', 'Laterality', 'ID', 'Site', 'TEST');
rc=hh.definedone();

do until (eof1);
set test1.table1 end=eof1;
rc = hh.add();
END;

DO UNTIL (eof2);
SET test2.table2 END=eof2;
rc = hh.find();
IF rc EQ 0 THEN OUTPUT;
end;
stop;
run;

LOG:

ERROR: Hash object added 1572848 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: There were 1572850 observations read from the data set test1.table1.
NOTE: There were 1 observations read from the data set test2.table2.
WARNING: The data set test1.table3 may be incomplete. When this step was stopped there were 0 observations and 27 variables.

Swain

Satish_Parida · Posted 02-27-2018 01:11 PM

1. Please do not use the table with transaction data(heavier tables) as hash table.
2. The hash is called hash mapping. So mapping tables are used as hash tables.
3. Your program is crashing because, a certain amount of space is allocated for HASH tables, which is breached. We used SQL and indexes instead when we had these issues.

mkeintz · Posted 02-27-2018 02:01 PM

You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1. Your program assumes there is only 1 record per ID in table1. It makes no assumption about table 2.

You ran into a memory problem storing table 1 into a hash object. There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.

Increase memory
On my windows machine, if I run "proc options group=memory;run;" as suggested by others, it shows "memsize=2147483648" (2 gigabytes). You could start your sas session with more memory. For instance, if I click on the Start button and type
"sas -memsize 4g"
I have double the memory.
Decreasing hash memory demands.
1. If table2 requires less memory than table1 and has only 1 observations per ID, then reverse the roles of table 1 and table 2. How do you determine the memory required? Load a single record into the object (termed a single "data item" in hash lexicon), get the itemsize attribute ("bytes_per_item=hh.itemsize;") and multiply by the number of unique ID's.
  
  By the way, I would not ADD one record at a time. Instead use this declare statement to load it all at once:
```
declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
  hh.definekey ('ID');
  hh.definedata(all:'Y');
  hh.definedone();
```
2. Still too much memory needed? You could then use the hash object merely as a pointer to a record id that could then be used in a "set .... POINT= " statement, as here (untested).
```
data rid /view=rid;  /* Make pointers - record IDs*/
   set table1 (keep=id );
   rid=_n_;
run;

data want;
   if _n_=1 then declare hash hh (dataset:'rid');
      hh.definekey('id');
      hh.definedata(all:'Y');
      hh.definedone();
   end;
   set table2;
   rc=hh.find();
   if rc=0;
   set table1 (keep=id abstract behavior ......) point=rid;
run;      
```

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 02-27-2018 03:06 PM

Your table TABLE1 contains 2 variables with 32KB length so no wonder you run out of memory.

What's in your table TABLE2? If this one is smaller then may-be load this one into the hash. And if the relationship between your tables is one-to-many then use the multidata option in the declare statement and the do_over() method when reading from the hash.

And just FYI: If you just want to load a table into a hash then you don't need to loop over the table but you can load it directly using code like: dcl hash hh(dataset:'test1.table1'); ....

andreas_lds · Posted 02-27-2018 11:05 AM

Impossible to suggest something without seeing the code and some knowledge about the data and setup of your sas environment. For the later please execute

proc options group=memory;
run;

and post the result using {i} icon.

DeepakSwain · Posted 02-27-2018 01:08 PM

proc options group=memory;
run;

LOG shows:

SAS (r) Proprietary Software Release 9.4 TS1M3

Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable
programs loaded by SAS.
MEMSIZE=107213322240
Specifies the limit on the amount of virtual memory that can be used during
a SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Swain

Classroom Training Available!