Hi,
i have a SAS dataset in Mainframe. To sort the dataset using PROC SORT it is taking 8 hours time, as it has 40 Million records. I have tried by using options TAGSORT, THREADS, but no use of it.
Can any body let me know the efficient way to sort the dataset.
Thank you.
Are you trying to run this in interactive SAS or in a batch job? My guess is that you are contending for
operating system resources / scheduling, especially if you are running interactively. This is a more appropriate
activity for a batch job. (and then you can look at things like region size, etc)
Also, look at the following SAS options:
SORTPGM=
SORTSIZE=
My guess is that you have a sort package on your mainframe, and thus SORTPGM=HOST is the appropriate
option setting. (I'm dredging this up from memory).
Carl
Apart from Carl's suggestions you could split the data in two and sort half each running in parallel at the same time in two separate programs. That should nearly halve the processing time.
Sort program1:
proc sort data = large (obs = half_total_obs)
out = lib.half1
;
by by_vars;
run;
Sort program2:
proc sort data = large (firstobs = half_total_obs + 1)
out = lib.half2
;
by by_vars;
run;
Combine 2 sorts program:
data whole;
set lib.half1 lib.half2;
by by_vars;
run;
Thank You SASKiwi,
I have tried this option for other dataset, but no use. I will try for my dataset and let you know.
Thank you Carl,
i will try today with above options, will let you know the execution time.
I tried by by coding options SORTPGM=HOST. But it is not working, and it was taking more time to execute.
Before changing option for SORTPGM, it was BEST, When it was BEST it was taking 2 Hrs to sort 10 Million records. In case of HOST it was taking ~3 Hrs.
@SASKiwi:
I tried this way, but no use. this method is taking same time to sort 10 miliion records, when i use proc sort option.
Determine how large your data set is (physically) (sum of size of vars * number of records)
Depending on that, you might consider exporting the data to a flat file and sort that externally (linux).
In my own experience I have to say that SAS performance in z/OS is surprisingly bad, after migrating to a 2-CPU pSeries (with AIX) we noticed that it ran circles around the MF.
Also keep in mind that SAS generates a utility file while sorting, and then writes the sorted data back.
You should make sure that the source and target of the sort are not located where your WORK library is (or the place where the UTILLOC system option points to).
From your description I thing that you are heavily I/O bound, that's why the trick with dividing the data set did not make a difference.
Sometimes data are partially sorted but you need that final sorting exercise.
If that situation occurs look for subsets that are ordered. Removing these subsets from the whole might reduce the demand for sort work space. (Thinking about those: What sort work areas sizes have you defined?)
Check out the companion for SAS on your mainframe.
good luck
peterC
Thank you Peter,
This is nice answer, But in my dataset i have close to 60 columns. Of course i will check for your option too. will let you know.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.