BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

Hi there,

 

In order to get the benefit of hash object programming for look up, I am trying to get a subset of a data having 3 million records based on the records with unique identifier present in another table with 1.3 million records. While doing so, getting an error message :

Insufficient memory to data step program. The SAS system stopped processing this step because of insufficient memory.

 

Can anybody suggest me to overcome the error or an alternative means to improve efficiency. 

 

Thank you in advance for your kind reply.

 

Regards,

 

 

Swain
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1.  Your program assumes there is only 1 record per ID in table1.  It makes no assumption about table 2.

 

You ran into a memory problem storing table 1 into a hash object.  There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.

 

  1. Increase memory
    On my windows machine, if I run "proc options group=memory;run;" as suggested by others, it shows "memsize=2147483648"  (2 gigabytes).   You could start your sas session with more memory.  For instance, if I click on the Start button and type   
        "sas -memsize 4g"
    I have double the memory.

  2. Decreasing hash memory demands. 
    1. If table2 requires less memory than table1 and has only 1 observations per ID, then reverse the roles of table 1 and table 2.  How do you determine the memory required?   Load a single record into the object (termed a single "data item" in hash lexicon), get the itemsize attribute   ("bytes_per_item=hh.itemsize;") and multiply by the number of unique ID's.

      By the way, I would not ADD one record at a time.  Instead use this declare statement to load it all at once:

      declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
        hh.definekey ('ID');
        hh.definedata(all:'Y');
        hh.definedone();
      
    2. Still too much memory needed?  You could then use the hash object merely as a pointer to a record id that could then be used in a "set .... POINT= " statement, as here (untested).
      data rid /view=rid;  /* Make pointers - record IDs*/
         set table1 (keep=id );
         rid=_n_;
      run;
      
      data want;
         if _n_=1 then declare hash hh (dataset:'rid');
            hh.definekey('id');
            hh.definedata(all:'Y');
            hh.definedone();
         end;
         set table2;
         rc=hh.find();
         if rc=0;
         set table1 (keep=id abstract behavior ......) point=rid;
      run;      
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
DeepakSwain
Pyrite | Level 9
data test1.table3 (drop=rc);
length Abstract $32767 Behavior $1 Comment $32767 Grade $1 Histology $4 Laterality $1 Site $4 TEST $20 ;

declare hash hh();
rc=hh.definekey('ID');
rc=hh.definedata('AbsrefID', 'Abstract', 'Behavior', 'Comment', 'DATEUPDATED', 'Grade', 'Histology', 'Laterality', 'ID', 'Site', 'TEST');
rc=hh.definedone();

do until (eof1);
set test1.table1 end=eof1;
rc = hh.add();
END;

DO UNTIL (eof2);
SET test2.table2 END=eof2;
rc = hh.find();
IF rc EQ 0 THEN OUTPUT;
end;
stop;
run;

LOG:

ERROR: Hash object added 1572848 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: There were 1572850 observations read from the data set test1.table1.
NOTE: There were 1 observations read from the data set test2.table2.
WARNING: The data set test1.table3 may be incomplete. When this step was stopped there were 0 observations and 27 variables.

Swain
Satish_Parida
Lapis Lazuli | Level 10
1. Please do not use the table with transaction data(heavier tables) as hash table.
2. The hash is called hash mapping. So mapping tables are used as hash tables.
3. Your program is crashing because, a certain amount of space is allocated for HASH tables, which is breached. We used SQL and indexes instead when we had these issues.
mkeintz
PROC Star

You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1.  Your program assumes there is only 1 record per ID in table1.  It makes no assumption about table 2.

 

You ran into a memory problem storing table 1 into a hash object.  There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.

 

  1. Increase memory
    On my windows machine, if I run "proc options group=memory;run;" as suggested by others, it shows "memsize=2147483648"  (2 gigabytes).   You could start your sas session with more memory.  For instance, if I click on the Start button and type   
        "sas -memsize 4g"
    I have double the memory.

  2. Decreasing hash memory demands. 
    1. If table2 requires less memory than table1 and has only 1 observations per ID, then reverse the roles of table 1 and table 2.  How do you determine the memory required?   Load a single record into the object (termed a single "data item" in hash lexicon), get the itemsize attribute   ("bytes_per_item=hh.itemsize;") and multiply by the number of unique ID's.

      By the way, I would not ADD one record at a time.  Instead use this declare statement to load it all at once:

      declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
        hh.definekey ('ID');
        hh.definedata(all:'Y');
        hh.definedone();
      
    2. Still too much memory needed?  You could then use the hash object merely as a pointer to a record id that could then be used in a "set .... POINT= " statement, as here (untested).
      data rid /view=rid;  /* Make pointers - record IDs*/
         set table1 (keep=id );
         rid=_n_;
      run;
      
      data want;
         if _n_=1 then declare hash hh (dataset:'rid');
            hh.definekey('id');
            hh.definedata(all:'Y');
            hh.definedone();
         end;
         set table2;
         rc=hh.find();
         if rc=0;
         set table1 (keep=id abstract behavior ......) point=rid;
      run;      
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Your table TABLE1 contains 2 variables with 32KB length so no wonder you run out of memory.

 

What's in your table TABLE2? If this one is smaller then may-be load this one into the hash. And if the relationship between your tables is one-to-many then use the multidata option in the declare statement and the do_over() method when reading from the hash.

 

And just FYI: If you just want to load a table into a hash then you don't need to loop over the table but you can load it directly using code like: dcl hash hh(dataset:'test1.table1'); ....

 

 

andreas_lds
Jade | Level 19

Impossible to suggest something without seeing the code and some knowledge about the data and setup of your sas environment. For the later please execute

proc options group=memory;
run;

and post the result using {i} icon.

DeepakSwain
Pyrite | Level 9
proc options group=memory;
run;

LOG shows:

 

SAS (r) Proprietary Software Release 9.4 TS1M3


Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable
programs loaded by SAS.
MEMSIZE=107213322240
Specifies the limit on the amount of virtual memory that can be used during
a SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

 

Swain

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 16459 views
  • 6 likes
  • 6 in conversation