DATA Step, Macro, Functions and more

SORT 70G of data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

SORT 70G of data

Hello,

 

When I'm trying to sort a table with 70Go I've the "DISK full" message...

 

But I've a WORK with 903Go Smiley Sad

 

Usually i'm expecting that a SORT will not use above 70Go * 3 of space

Do you have an idea why i'm facing this kind of pb?

 

The table is like this :

100 561 233 rows
270 columns
Observation Length : 14 284
Table is compressed (CHAR)

 

To circumvent the pb, i'm currently trying TAGSORT, I'will also trying to split the data in 10 sub-tables.


Accepted Solutions
Solution
‎10-14-2016 06:52 AM
Super User
Posts: 7,854

Re: SORT 70G of data

Posted in reply to pinkY2229

The utility file of a proc sort is not compressed, so you can deduct its size from the size of the file and the compression factor shown in the log when you create it.

Depending on the contents, your compression ratio could well be above 90%, and then it is no surprise you run out of space.

Using tagsort is the right remedy.

I'd also consider adding additional disk space that is physically separate from your WORK, and set UTILLOC to it. That prevents concurrent read and write on the WORK disks.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Trusted Advisor
Posts: 1,584

Re: SORT 70G of data

Posted in reply to pinkY2229

There are probably other datasets in WORK from previous steps and/or from parallel sessions running at same time.

 

One of the solutions (maybe not the best) is to split the 70GB dataset into several smaller ones,

sort eache saparetly and finally merge them back.

Super User
Super User
Posts: 7,988

Re: SORT 70G of data

Posted in reply to pinkY2229

What is 70Go?  

 

Some things to check, do you have anything else in Work?

 

How are you sorting the data?

 

Where is the data residing, is it on a network or locally?

Occasional Contributor
Posts: 5

Re: SORT 70G of data

thank u for your message.

 

size is 70Gb (gigas)

I'v nothing else in my work : 'im monitoring the work every minute with df ('im under AIX) and I can see only my WORK Folder growing, growing...

 

The table is already in the WORK and I want to sort it .

Super User
Posts: 19,867

Re: SORT 70G of data

Posted in reply to pinkY2229

Are you sure you have access to full size? IT can often limit work space. 

Super User
Super User
Posts: 7,988

Re: SORT 70G of data

Posted in reply to pinkY2229

Perhaps this article will help - particularly sortsize:

http://support.sas.com/documentation/cdl/en/hostunx/67929/HTML/default/viewer.htm#n1svfqjvm6a2sfn1ab...

http://support.sas.com/documentation/cdl/en/hostwin/67962/HTML/default/viewer.htm#n0ea63jfjic0vpn15d...

 

Not sure how you plan on working with that data, its a big chunk anyways you look at it (i wouldn't personally be happy with 270 columns - anymore than 20 or so is difficult programming wise).

 

Super User
Posts: 19,867

Re: SORT 70G of data

Posted in reply to pinkY2229

Are you sure you have access to full size? IT can often limit work space. 

 

Also, what's the exact error you receive?

Solution
‎10-14-2016 06:52 AM
Super User
Posts: 7,854

Re: SORT 70G of data

Posted in reply to pinkY2229

The utility file of a proc sort is not compressed, so you can deduct its size from the size of the file and the compression factor shown in the log when you create it.

Depending on the contents, your compression ratio could well be above 90%, and then it is no surprise you run out of space.

Using tagsort is the right remedy.

I'd also consider adding additional disk space that is physically separate from your WORK, and set UTILLOC to it. That prevents concurrent read and write on the WORK disks.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 5

Re: SORT 70G of data

Posted in reply to KurtBremser

Indeed,

 

I have a ratio of 95.43 percent (!?)

70Gb * 95.43 percent.= something below 700, but 700 into * 3 > 900Gb WORK folder.

 

tagsort did the job : 

NOTE: PROCEDURE SORT used (Total process time):
      real time           1:41:46.94
      user cpu time       27:57.53
      system cpu time     4:36.74
      memory              266864.43k
      OS Memory           272352.00k
Super User
Posts: 7,854

Re: SORT 70G of data

Posted in reply to pinkY2229

Actually, your compressed file has less than 5% of its uncompressed size. So the uncompressed file would be roughly 20*70 GB in size, and that amounts to 1.4 TB(!).

When sorting large compressed datasets with a considerable compression rate (>80%), I always use TAGSORT, just to prevent a disk full condition in my UTILLOC.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 10,044

Re: SORT 70G of data

Posted in reply to pinkY2229

For big table, I would like to split it into small tables and combine them together later. Like:

 

data F M;
 set sashelp.class;
 if sex='F' then output F;
 else if sex='M' then output M;
run;

data want;
 set F M;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 555 views
  • 0 likes
  • 6 in conversation