05-07-2014 10:55 AM
I am trying to sort a large dataset with 367538640 rows . Sort is getting failed because of the space issue. Tried using options compress =yes and tagsort . Tagsort is taking very long time. Please suggest any alternatives.
05-07-2014 12:54 PM
It is likely that the problem is that SAS creates a copy of the dataset, which will overwrite the original data when the sort procedure is finish. Therefore, if you have a system where there is more space on the drive where permanent data is supposed to be saved, then your can tell sas that it should use that drive as a work-directory. But, remember to change back afterwards.
If you use windows you should add
in the command line from where you start sas.
see the documentation here: http://support.sas.com/documentation/cdl/en/lesysoptsref/66899/HTML/default/viewer.htm#p1er6tm8fay8u...
05-07-2014 04:12 PM
Tagsort and compressing the definitive dataset will not help you much.
The sorting requires apx 3 times the sizing of the original dataset as intermediate work.
Overwriting the original datasets is adding the need of one additional copy.You can redirect the intermediate work to an other location using utilloc system option.
I am assuming you are using a server of some kind with a limited setup in this 365M records is a big number what is the size of that? if a recordsize is 100 bytes it should by 36Gb.
Unless your logical requirement is absolutely needing the sort there are possible better solutions to your original question.
Needing this sort really, you could try to split this big data set in multiple smaller ones and merge the several sorted smaller ones in a dedicated step.
05-08-2014 01:33 AM
UTILLOC in the configuration file allows you to specify a location different from WORK for the temporary sort file. This will reduce the requirement for the file to be sorted to 2x.
If you do
proc sort data=x1.xxx out=x2.xxx;
where x1 and x2 are libraries on different file systems, this may also help preventing an out of space condition, because you "only" need the size of xxx to be free one time in the UTILLOC and the x2 location, alike.
Then I recommend what Jaap suggested, split the file, sort every partial file on its own, and then do:
This is called interleaving, the sort order is preserved.