Help using Base SAS procedures

Issue in sorting a large dataset

Posts: 31

Issue in sorting a large dataset


I am trying to sort a large dataset with 367538640 rows . Sort is getting failed because of the space issue. Tried using options compress =yes and tagsort . Tagsort is taking very long time. Please suggest any alternatives.


Super Contributor
Posts: 305

Re: Issue in sorting a large dataset

It is likely that the problem is that SAS creates a copy of the dataset, which will overwrite the original data when the sort procedure is finish. Therefore, if you have a system where there is more space on the drive where permanent data is supposed to be saved, then your can tell sas that it should use that drive as a work-directory. But, remember to change back afterwards.

If you use windows you should add

-work "d:\path_to_temporary_workfolder"

in the command line from where you start sas.

see the documentation here:

Trusted Advisor
Posts: 3,215

Re: Issue in sorting a large dataset

see also:

Tagsort and compressing the definitive dataset will not help you much.

The sorting requires apx 3 times the sizing of the original dataset as intermediate work.

Overwriting the original datasets is adding the need of one additional copy.You can redirect the intermediate work to an other location using utilloc system option.

I am assuming you are using a server of some kind with a limited setup in this 365M records is a big number what is the size of that? if a recordsize is 100 bytes it should by 36Gb.

Unless your logical requirement is absolutely needing the sort there are possible better solutions to your original question.

Needing this sort really, you could try to split this big data set in multiple smaller ones and merge the several sorted smaller ones in a dedicated step.

---->-- ja karman --<-----
Super User
Posts: 10,283

Re: Issue in sorting a large dataset

UTILLOC in the configuration file allows you to specify a location different from WORK for the temporary sort file. This will reduce the requirement for the file to be sorted to 2x.

If you do

proc sort;

where x1 and x2 are libraries on different file systems, this may also help preventing an out of space condition, because you "only" need the size of xxx to be free one time in the UTILLOC and the x2 location, alike.

Then I recommend what Jaap suggested, split the file, sort every partial file on its own, and then do:

data want;







by sortcrit;


This is called interleaving, the sort order is preserved.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation