02-19-2014 12:49 PM
i have a SAS dataset in Mainframe. To sort the dataset using PROC SORT it is taking 8 hours time, as it has 40 Million records. I have tried by using options TAGSORT, THREADS, but no use of it.
Can any body let me know the efficient way to sort the dataset.
02-19-2014 01:57 PM
Are you trying to run this in interactive SAS or in a batch job? My guess is that you are contending for
operating system resources / scheduling, especially if you are running interactively. This is a more appropriate
activity for a batch job. (and then you can look at things like region size, etc)
Also, look at the following SAS options:
My guess is that you have a sort package on your mainframe, and thus SORTPGM=HOST is the appropriate
option setting. (I'm dredging this up from memory).
02-19-2014 02:19 PM
Apart from Carl's suggestions you could split the data in two and sort half each running in parallel at the same time in two separate programs. That should nearly halve the processing time.
proc sort data = large (obs = half_total_obs)
out = lib.half1
proc sort data = large (firstobs = half_total_obs + 1)
out = lib.half2
Combine 2 sorts program:
set lib.half1 lib.half2;
02-27-2014 11:18 PM
I tried by by coding options SORTPGM=HOST. But it is not working, and it was taking more time to execute.
Before changing option for SORTPGM, it was BEST, When it was BEST it was taking 2 Hrs to sort 10 Million records. In case of HOST it was taking ~3 Hrs.
I tried this way, but no use. this method is taking same time to sort 10 miliion records, when i use proc sort option.
02-28-2014 01:18 AM
Determine how large your data set is (physically) (sum of size of vars * number of records)
Depending on that, you might consider exporting the data to a flat file and sort that externally (linux).
In my own experience I have to say that SAS performance in z/OS is surprisingly bad, after migrating to a 2-CPU pSeries (with AIX) we noticed that it ran circles around the MF.
Also keep in mind that SAS generates a utility file while sorting, and then writes the sorted data back.
You should make sure that the source and target of the sort are not located where your WORK library is (or the place where the UTILLOC system option points to).
From your description I thing that you are heavily I/O bound, that's why the trick with dividing the data set did not make a difference.
02-19-2014 02:22 PM
Sometimes data are partially sorted but you need that final sorting exercise.
If that situation occurs look for subsets that are ordered. Removing these subsets from the whole might reduce the demand for sort work space. (Thinking about those: What sort work areas sizes have you defined?)
Check out the companion for SAS on your mainframe.
02-20-2014 05:08 AM
Thank you Peter,
This is nice answer, But in my dataset i have close to 60 columns. Of course i will check for your option too. will let you know.