Hello!
I am trying to sort large dataset using hash but getting a fatal error:
170 data dssxx; set dss0906 (obs=20000000); run;
NOTE: There were 20000000 observations read from the data set WORK.DSS0906.
NOTE: The data set WORK.DSSXX has 20000000 observations and 8 variables.
NOTE: Compressing data set WORK.DSSXX decreased size by 27.93 percent.
Compressed is 228803 pages; un-compressed would require 317461 pages.
NOTE: DATA statement used (Total process time):
real time 3:04.45
cpu time 1:19.01
171 data dss;
172 if 0 then set dssxx;
173 dcl hash hh (dataset: 'work.dssxx', hashexp: 0, ordered: 'd');
174 dcl hiter hi ('hh');
175 hh.definekey ('ss_kod', 'site', 'sbal_kod', 'data' );
176 hh.definedata ('site', 'sbal_kod', 'ss_kod', 'data', 'ss_ostd', 'ss_ostc');
177 hh.definedone();
178 do rc=hi.first() by 0 while(rc=0);
179 ost=ss_ostd-ss_ostc;
180 output;
181 rc=hi.next();
182 end;
183 drop rc ss_ostd ss_ostc ss_obd ss_obc; rename data=date;
184 stop;
185 run;
FATAL: Insufficient memory to execute data step program. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.DSS may be incomplete. When this step was stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.DSS was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 41.31 seconds
cpu time 40.59 seconds
The smaller subset sorts fine by the same code:
186 data dssxx; set dss0906 ; where site ne 'MSK'; run;
NOTE: There were 4118570 observations read from the data set WORK.DSS0906.
WHERE site not = 'MSK';
NOTE: The data set WORK.DSSXX has 4118570 observations and 8 variables.
NOTE: Compressing data set WORK.DSSXX decreased size by 29.04 percent.
Compressed is 46390 pages; un-compressed would require 65375 pages.
NOTE: DATA statement used (Total process time):
real time 6:48.00
cpu time 1:08.98
187 data dss;
188 if 0 then set dssxx;
189 dcl hash hh (dataset: 'work.dssxx', hashexp: 0, ordered: 'd');
190 dcl hiter hi ('hh');
191 hh.definekey ('ss_kod', 'site', 'sbal_kod', 'data' );
192 hh.definedata ('site', 'sbal_kod', 'ss_kod', 'data', 'ss_ostd', 'ss_ostc');
193 hh.definedone();
194 do rc=hi.first() by 0 while(rc=0);
195 ost=ss_ostd-ss_ostc;
196 output;
197 rc=hi.next();
198 end;
199 drop rc ss_ostd ss_ostc ss_obd ss_obc; rename data=date;
200 stop;
201 run;
NOTE: There were 4118570 observations read from the data set WORK.DSSXX.
NOTE: The data set WORK.DSS has 4118570 observations and 5 variables.
NOTE: Compressing data set WORK.DSS increased size by 7.72 percent.
Compressed is 43927 pages; un-compressed would require 40779 pages.
NOTE: DATA statement used (Total process time):
real time 27.81 seconds
cpu time 18.78 seconds
Does it mean that hashing not good for large datasets? Or something wrong with my code?
Machine is under Windows XP, 2.5 GB RAM, enough free disc space.
Thanks for any thoughts.