Sort a huge dataset in Mainframe

Reply
Contributor
Posts: 39

Sort a huge dataset in Mainframe

Hi,

i have a SAS dataset in Mainframe. To sort the dataset using PROC SORT it is taking 8 hours time, as it has 40 Million records. I have tried by using options TAGSORT, THREADS, but no use of it.

Can any body let me know the efficient way to sort the dataset.

Thank you.

SAS Employee
Posts: 2

Re: Sort a huge dataset in Mainframe

Are you trying to run this in interactive SAS or in a batch job?  My guess is that you are contending for

operating system resources / scheduling, especially if you are running interactively.  This is a more appropriate

activity for a batch job.   (and then you can look at things like region size, etc)

Also, look at the following SAS options:

SORTPGM=

SORTSIZE=

My guess is that you have a sort package on your mainframe, and thus SORTPGM=HOST is the appropriate

option setting.  (I'm dredging this up from memory).

Carl

Super User
Posts: 3,250

Re: Sort a huge dataset in Mainframe

Posted in reply to carl_sommer_sas_com

Apart from Carl's suggestions you could split the data in two and sort half each running in parallel at the same time in two separate programs. That should nearly halve the processing time.

Sort program1:

proc sort data = large (obs = half_total_obs)

              out = lib.half1

             ;

  by by_vars;

run;

Sort program2:

proc sort data = large (firstobs = half_total_obs + 1)

              out = lib.half2

             ;

  by by_vars;

run;

Combine 2 sorts program:

data whole;

  set  lib.half1 lib.half2;

  by by_vars;

run;

Contributor
Posts: 39

Re: Sort a huge dataset in Mainframe

Thank You SASKiwi,

I have tried this option for other dataset, but no use. I will try for my dataset and let you know.

Contributor
Posts: 39

Re: Sort a huge dataset in Mainframe

Posted in reply to carl_sommer_sas_com

Thank you Carl,

i will try today with above options, will let you know the execution time.

Contributor
Posts: 39

Re: Sort a huge dataset in Mainframe

Posted in reply to carl_sommer_sas_com

@

Super User
Posts: 7,762

Re: Sort a huge dataset in Mainframe

Determine how large your data set is (physically) (sum of size of vars * number of records)

Depending on that, you might consider exporting the data to a flat file and sort that externally (linux).

In my own experience I have to say that SAS performance in z/OS is surprisingly bad, after migrating to a 2-CPU pSeries (with AIX) we noticed that it ran circles around the MF.

Also keep in mind that SAS generates a utility file while sorting, and then writes the sorted data back.

You should make sure that the source and target of the sort are not located where your WORK library is (or the place where the UTILLOC system option points to).

From your description I thing that you are heavily I/O bound, that's why the trick with dividing the data set did not make a difference.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 2,177

Re: Sort a huge dataset in Mainframe

Sometimes data are partially sorted but you need that final sorting exercise.

If that situation occurs look for subsets that are ordered. Removing these subsets from the whole might reduce the demand for sort work space. (Thinking about those: What sort work areas sizes have you defined?)

Check out the companion for SAS on your mainframe.

good luck

peterC

Contributor
Posts: 39

Re: Sort a huge dataset in Mainframe

Thank you Peter,

This is nice answer, But in my dataset i have close to 60 columns. Of course i will check for your option too. will let you know.

Ask a Question
Discussion stats
  • 8 replies
  • 1058 views
  • 3 likes
  • 5 in conversation