BookmarkSubscribeRSS Feed
deleted_user
Not applicable
We recently deployed a new set of SAS workstations and are now running into limits as to how large of a data set we can sort. The limit is data set size not row count and appears to be between 18 and 20 GB. We had no problem sorting these same files on our older 32-bit XP environment.

The new systems are 64-bit server class machines with Windows 7 Professional installed, 4-CPU's, 32GB RAM, and have 150GB of dedicated SAS work space on a RAID array. We are running SAS 9.2 (TS2M2).

When sorting files that are below the threshold where we get failures the sorting is blazing fast. We can break the files up and ultimately get them sorted but I would prefer to fix the root cause of the issue.

A sample of the errors from the logs are:
ERROR: Failure while attempting to write page 82 of sorted run 637.
ERROR: Failure while attempting to write page 526690 to utility file 1.
ERROR: Failure encountered while creating initial set of sorted runs.
ERROR: Failure encountered during external sort.
ERROR: Sort execution failure.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 205470721 observations read from the data set LPSERV.TEST1.
WARNING: The data set LPSERV.TEST1 may be incomplete. When this step was stopped there were 0
observations and 21 variables.
WARNING: Data set LPSERV.TEST1 was not replaced because this step was stopped.
NOTE: PROCEDURE SORT used (Total process time):
real time 26:17.38
cpu time 14:44.46
3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12
First, are you running a supported version of SAS? Just saying TS2M2 isn't enough.
http://support.sas.com/kb/34/569.html
If so, this is probably worth a call to tech support.
greenbergmeth
Calcite | Level 5

PROC SORT with NODUPKEY will always return the physical first record - ie, as you list the data, c=71will be kept always. PROC SQL will not necessarily return any particular record; you could ask for minor max, but you could not guarantee the first record in sort order regardless of how you did the query; SQL will often resort the data as needed to accomplish the query as efficiently as possible.

They will be identical insomuch as they both return the same number of records, if that is your concern.

You cannot accomplish exactly the same thing in a straightforward manner in SQL; because SQL doesn't have a concept of row ordering, you would have to either have a method of choosing which c (max(c), min(c), etc.) or you would have to add a row counter and choose the lowest value of that.

For example:

data work.dataset;

input a b c;

rowcounter=_n_;

datalines;

27 93 71

27 93 72

46 68 75

55 55 33

46 68 68

34 34 32

45 67 88

56 75 22

34 34 32

;

run;

 

proc sql;

select a,b,min(rowcounter*100+c)-min(rowcounter*100) as c

from work.dataset

group by a,b;

quit;

That's using a cheat (knowing that rowcounter*100 will always dominate the size of c); of course if your c doesn't have values appropriate for that, this won't work and you're better off merging it on separately.

If you are interested in the SQL solution, you may consider posting that explicitly as a separate question as the SQL-only folk will then answer it.

 

 

LinusH
Tourmaline | Level 20
I think you are a candidate for a server based architecture. Gives you more power, easier to maintain and encouraging cooperation.
If that's not possible, start with an upgrade, shouldn't be that hard with stand alone installationso (and stand alone data?).
Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1291 views
  • 0 likes
  • 4 in conversation