And following @SASKiwi 's line of thought:
Why do you need to sort the table in first place? What do you intend to do with it? Eventually creating an index would be better ...but it depends on what you intend to do with the table.
If you don't need all the columns then drop them when reading the data.
/*
Here could give you a clue.
*/
data _11 _12 _13 _14 _15 _16;
set sashelp.class;
if age=11 then output _11;
else if age=12 then output _12;
else if age=13 then output _13;
else if age=14 then output _14;
else if age=15 then output _15;
else if age=16 then output _16;
run;
data want;
set _11 _12 _13 _14 _15 _16;
run;
Following up on @Ksharp 's hint, Troy Hughes has a bunch of papers (and books) on rolling your own parallel processing in SAS, e.g.:
https://support.sas.com/resources/papers/proceedings17/0870-2017.pdf
https://support.sas.com/resources/papers/proceedings16/11888-2016.pdf
One more approach could be:
proc contents data=sashelp.cars;
run;
/* sashelp.cars has 428 obs so... */
/* ...sort it in parts (with a macro loop probably) */
proc sort data=sashelp.cars(firstobs=1 obs=150) out=part1;
by model make origin;
run;
proc sort data=sashelp.cars(firstobs=151 obs=300) out=part2;
by model make origin;
run;
proc sort data=sashelp.cars(firstobs=301 obs=428) out=part3;
by model make origin;
run;
/* ...and combine with interleaving */
data sorted;
set part1 part2 part3;
by model make origin;
run;
and don't forget about data compression if variables are long.
Bart
A few interesting takes on sorting.
But are thay actuallay faster than a "standard" PROC SORT?
If so, I think the developers of PROC SORT didn't do a good job...
For SPDE: it's included in Base SAS, so it's availble to "everyone" from a license and a deployment perspective.
One benefit of SPDE it's the implict BY sorting - if you need this sort order occasionally you can have the next step call you source with a BY - no need to explict sort it. Thus lessen the need for IO.
Other than that I think minimize swapping is prioritized. Check how much available memory you have, and ramp up MEMSIZE and SORTSIZE accordingly.
Thanks LinusH. Could you give me a more detailed explanation of how to go about "check the amount of memory available and increase MEMSIZE and SORTSIZE accordingly" ? I tried setting SORTSIZE to MAX but the performance didn't improve significantly
If this takes place on a server, you probably have some kind of monitoring tool.
The most simple if on windows is to open Task Manager, or in Linux/UNIX issue the top command.
Your server admin might have more fine tuned figues to share, lika average memory used during the hours you intend to run this job.
Sp let's say there's on average 69 GB RAM available, set MEMSIZE to 64GB and SORTSIZE to a litel less, maybe 60GB?
Using
proc options option=sortsize;
runs;
Is it ok to set a value just below the one encountered by this procedure?
The idea "behind" goes like this:
If you have 8GB for sorting, and 40GB to sort, after splitting that 40GB into 10 parts (4GB each) there is a chance you will be able to sort each part in memory, without I/O swapping.
Bart
Bart, I didn't understand much. Could you make a practical example with sas code explaining the usage of these 2 sas options? Thank you
First step would be to see how much RAM for sorting your session has:
proc options option=sortsize;
run;
this will tell you how much RAM you have for sorting.
The firstobs= and obs= indicates from which observation to which observation SAS should read the dataset e.g.,
data part1;
set sashelp.class(firstobs=1 obs=10);
run;
data part2;
set sashelp.class(firstobs=11 obs=19);
run;
So if the data are ~40GB in size (and let assume 40 millions obs.), and you have ~8GB ram for sorting, then (taking into account safe margin) if you split your data into 10 even parts (4 million obs each) you will get ~4GB size by each, and you can sort each part in memory. Then when you create 10 sorted "small" datasets you can use SET statement with BY for interleaving.
proc sort data=sashelp.class(firstobs=1 obs=10) out=part1;
by age;
run;
proc sort data=sashelp.class(firstobs=11 obs=19) out=part2;
by age;
run;
data sorted;
set part part2;
by age;
run;
hope it helps
Bart
Ok, the example is clear. But is this approach a substitute for increasing the SORTSIZE? Which is better between the 2 things?
Thank you
I wouldn't say "substitute", it's just different approach.
In setups I was working we didn't have authorisation to modify session setup (e.g sortsize) so we had to search for different tools.
You need to try and compare results. If you are "the only one" user of the machine you could go with increasing sortsize, but if there are other users it may not be possible without "destabilising" their sessions... go by Maxim 4 (maxims)
Bart
from the result of proc options option=sortsize;
runs; I get the value of SORTSIZE=32212254720, but I don't understand in which unit of measure this number is expressed, could you tell me?
32212254720B = 31457280KB = 30720MB = 30GB nice setup 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.