BookmarkSubscribeRSS Feed
Subbarao
Fluorite | Level 6

Hi,

i have a SAS dataset in Mainframe. To sort the dataset using PROC SORT it is taking 8 hours time, as it has 40 Million records. I have tried by using options TAGSORT, THREADS, but no use of it.

Can any body let me know the efficient way to sort the dataset.

Thank you.

8 REPLIES 8
carl_sommer
SAS Employee

Are you trying to run this in interactive SAS or in a batch job?  My guess is that you are contending for

operating system resources / scheduling, especially if you are running interactively.  This is a more appropriate

activity for a batch job.   (and then you can look at things like region size, etc)

Also, look at the following SAS options:

SORTPGM=

SORTSIZE=

My guess is that you have a sort package on your mainframe, and thus SORTPGM=HOST is the appropriate

option setting.  (I'm dredging this up from memory).

Carl

SASKiwi
PROC Star

Apart from Carl's suggestions you could split the data in two and sort half each running in parallel at the same time in two separate programs. That should nearly halve the processing time.

Sort program1:

proc sort data = large (obs = half_total_obs)

              out = lib.half1

             ;

  by by_vars;

run;

Sort program2:

proc sort data = large (firstobs = half_total_obs + 1)

              out = lib.half2

             ;

  by by_vars;

run;

Combine 2 sorts program:

data whole;

  set  lib.half1 lib.half2;

  by by_vars;

run;

Subbarao
Fluorite | Level 6

Thank You SASKiwi,

I have tried this option for other dataset, but no use. I will try for my dataset and let you know.

Subbarao
Fluorite | Level 6

Thank you Carl,

i will try today with above options, will let you know the execution time.

Subbarao
Fluorite | Level 6

@

Kurt_Bremser
Super User

Determine how large your data set is (physically) (sum of size of vars * number of records)

Depending on that, you might consider exporting the data to a flat file and sort that externally (linux).

In my own experience I have to say that SAS performance in z/OS is surprisingly bad, after migrating to a 2-CPU pSeries (with AIX) we noticed that it ran circles around the MF.

Also keep in mind that SAS generates a utility file while sorting, and then writes the sorted data back.

You should make sure that the source and target of the sort are not located where your WORK library is (or the place where the UTILLOC system option points to).

From your description I thing that you are heavily I/O bound, that's why the trick with dividing the data set did not make a difference.

Peter_C
Rhodochrosite | Level 12

Sometimes data are partially sorted but you need that final sorting exercise.

If that situation occurs look for subsets that are ordered. Removing these subsets from the whole might reduce the demand for sort work space. (Thinking about those: What sort work areas sizes have you defined?)

Check out the companion for SAS on your mainframe.

good luck

peterC

Subbarao
Fluorite | Level 6

Thank you Peter,

This is nice answer, But in my dataset i have close to 60 columns. Of course i will check for your option too. will let you know.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2637 views
  • 3 likes
  • 5 in conversation