Hi there,
In order to get the benefit of hash object programming for look up, I am trying to get a subset of a data having 3 million records based on the records with unique identifier present in another table with 1.3 million records. While doing so, getting an error message :
Insufficient memory to data step program. The SAS system stopped processing this step because of insufficient memory.
Can anybody suggest me to overcome the error or an alternative means to improve efficiency.
Thank you in advance for your kind reply.
Regards,
You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1. Your program assumes there is only 1 record per ID in table1. It makes no assumption about table 2.
You ran into a memory problem storing table 1 into a hash object. There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.
declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
hh.definekey ('ID');
hh.definedata(all:'Y');
hh.definedone();
data rid /view=rid; /* Make pointers - record IDs*/
set table1 (keep=id );
rid=_n_;
run;
data want;
if _n_=1 then declare hash hh (dataset:'rid');
hh.definekey('id');
hh.definedata(all:'Y');
hh.definedone();
end;
set table2;
rc=hh.find();
if rc=0;
set table1 (keep=id abstract behavior ......) point=rid;
run;
Show us your code
data test1.table3 (drop=rc);
length Abstract $32767 Behavior $1 Comment $32767 Grade $1 Histology $4 Laterality $1 Site $4 TEST $20 ;
declare hash hh();
rc=hh.definekey('ID');
rc=hh.definedata('AbsrefID', 'Abstract', 'Behavior', 'Comment', 'DATEUPDATED', 'Grade', 'Histology', 'Laterality', 'ID', 'Site', 'TEST');
rc=hh.definedone();
do until (eof1);
set test1.table1 end=eof1;
rc = hh.add();
END;
DO UNTIL (eof2);
SET test2.table2 END=eof2;
rc = hh.find();
IF rc EQ 0 THEN OUTPUT;
end;
stop;
run;
LOG:
ERROR: Hash object added 1572848 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: There were 1572850 observations read from the data set test1.table1.
NOTE: There were 1 observations read from the data set test2.table2.
WARNING: The data set test1.table3 may be incomplete. When this step was stopped there were 0 observations and 27 variables.
You want to match records in table1 and table2 by id, and keep only those with id's in both tables, by using a hash object for table 1. Your program assumes there is only 1 record per ID in table1. It makes no assumption about table 2.
You ran into a memory problem storing table 1 into a hash object. There are two solutions: (1) increase memory available to the program, or (2) reduce memory demanded by hash.
declare hash hh(dataset:"table1 (keep=abserfid abstract behavior comment dateupdated grade histology laterality id site test)");
hh.definekey ('ID');
hh.definedata(all:'Y');
hh.definedone();
data rid /view=rid; /* Make pointers - record IDs*/
set table1 (keep=id );
rid=_n_;
run;
data want;
if _n_=1 then declare hash hh (dataset:'rid');
hh.definekey('id');
hh.definedata(all:'Y');
hh.definedone();
end;
set table2;
rc=hh.find();
if rc=0;
set table1 (keep=id abstract behavior ......) point=rid;
run;
Your table TABLE1 contains 2 variables with 32KB length so no wonder you run out of memory.
What's in your table TABLE2? If this one is smaller then may-be load this one into the hash. And if the relationship between your tables is one-to-many then use the multidata option in the declare statement and the do_over() method when reading from the hash.
And just FYI: If you just want to load a table into a hash then you don't need to loop over the table but you can load it directly using code like: dcl hash hh(dataset:'test1.table1'); ....
Impossible to suggest something without seeing the code and some knowledge about the data and setup of your sas environment. For the later please execute
proc options group=memory;
run;
and post the result using {i} icon.
proc options group=memory;
run;
LOG shows:
SAS (r) Proprietary Software Release 9.4 TS1M3
Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=0 Specifies a limit on the amount of memory that is available for data
summarization procedures when class variables are active.
MAXMEMQUERY=0 Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable
programs loaded by SAS.
MEMSIZE=107213322240
Specifies the limit on the amount of virtual memory that can be used during
a SAS session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.