BookmarkSubscribeRSS Feed
archana
Fluorite | Level 6

HI,

I am trying to sort a large dataset with 367538640 rows . Sort is getting failed because of the space issue. Tried using options compress =yes and tagsort . Tagsort is taking very long time. Please suggest any alternatives.

Thanks,


3 REPLIES 3
JacobSimonsen
Barite | Level 11

It is likely that the problem is that SAS creates a copy of the dataset, which will overwrite the original data when the sort procedure is finish. Therefore, if you have a system where there is more space on the drive where permanent data is supposed to be saved, then your can tell sas that it should use that drive as a work-directory. But, remember to change back afterwards.

If you use windows you should add

-work "d:\path_to_temporary_workfolder"

in the command line from where you start sas.

see the documentation here: http://support.sas.com/documentation/cdl/en/lesysoptsref/66899/HTML/default/viewer.htm#p1er6tm8fay8u...

jakarman
Barite | Level 11

see also: https://communities.sas.com/message/209847#209847

Tagsort and compressing the definitive dataset will not help you much.

The sorting requires apx 3 times the sizing of the original dataset as intermediate work.

Overwriting the original datasets is adding the need of one additional copy.You can redirect the intermediate work to an other location using utilloc system option.

I am assuming you are using a server of some kind with a limited setup in this 365M records is a big number what is the size of that? if a recordsize is 100 bytes it should by 36Gb.

Unless your logical requirement is absolutely needing the sort there are possible better solutions to your original question.

Needing this sort really, you could try to split this big data set in multiple smaller ones and merge the several sorted smaller ones in a dedicated step.

---->-- ja karman --<-----
Kurt_Bremser
Super User

UTILLOC in the configuration file allows you to specify a location different from WORK for the temporary sort file. This will reduce the requirement for the file to be sorted to 2x.

If you do

proc sort data=x1.xxx out=x2.xxx;

where x1 and x2 are libraries on different file systems, this may also help preventing an out of space condition, because you "only" need the size of xxx to be free one time in the UTILLOC and the x2 location, alike.

Then I recommend what Jaap suggested, split the file, sort every partial file on its own, and then do:

data want;

set

  have1

  have2

  ...

  haven

;

by sortcrit;

run;

This is called interleaving, the sort order is preserved.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 767 views
  • 0 likes
  • 4 in conversation