BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Patrick
Opal | Level 21

And following @SASKiwi 's line of thought: 

Why do you need to sort the table in first place? What do you intend to do with it? Eventually creating an index would be better ...but it depends on what you intend to do with the table.

If you don't need all the columns then drop them when reading the data.

Ksharp
Super User
/*
Here could give you a clue.
*/

data _11 _12 _13 _14 _15 _16;
 set sashelp.class;
      if age=11 then output _11;
 else if age=12 then output _12;
 else if age=13 then output _13;
 else if age=14 then output _14;
 else if age=15 then output _15;
 else if age=16 then output _16;
run;

data want;
 set _11 _12 _13 _14 _15 _16;
run;
Quentin
Super User

Following up on @Ksharp 's hint, Troy Hughes has a bunch of papers (and books) on rolling your own parallel processing in SAS, e.g.:

https://support.sas.com/resources/papers/proceedings17/0870-2017.pdf

https://support.sas.com/resources/papers/proceedings16/11888-2016.pdf

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
yabwon
Onyx | Level 15

One more approach could be: 

proc contents data=sashelp.cars;
run;
/* sashelp.cars has 428 obs so... */


/* ...sort it in parts (with a macro loop probably) */
proc sort data=sashelp.cars(firstobs=1   obs=150) out=part1;
  by model make origin;
run;

proc sort data=sashelp.cars(firstobs=151 obs=300) out=part2;
  by model make origin;
run;

proc sort data=sashelp.cars(firstobs=301 obs=428) out=part3;
  by model make origin;
run;


/* ...and combine with interleaving */
data sorted;
  set part1 part2 part3;
  by model make origin;
run;

and don't forget about data compression if variables are long.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



LinusH
Tourmaline | Level 20

A few interesting takes on sorting.

But are thay actuallay faster than a "standard" PROC SORT?

If so, I think the developers of PROC SORT didn't do a good job...

 

For SPDE: it's included in Base SAS, so it's availble to "everyone" from a license and a deployment perspective.

One benefit of SPDE it's the implict BY sorting - if you need this sort order occasionally you can have the next step call you source with a BY - no need to explict sort it. Thus lessen the need for IO.

 

Other than that I think minimize swapping is prioritized. Check how much available memory you have, and ramp up MEMSIZE and SORTSIZE accordingly.

Data never sleeps
mariopellegrini
Pyrite | Level 9

Thanks LinusH. Could you give me a more detailed explanation of how to go about "check the amount of memory available and increase MEMSIZE and SORTSIZE accordingly" ? I tried setting SORTSIZE to MAX but the performance didn't improve significantly

LinusH
Tourmaline | Level 20

If this takes place on a server, you probably have some kind of monitoring tool.

The most simple if on windows is to open Task Manager, or in Linux/UNIX issue the top command.

Your server admin might have more fine tuned figues to share, lika average memory used during the hours you intend to run this job.

 

Sp let's say there's on average 69 GB RAM available, set MEMSIZE to 64GB and SORTSIZE to a litel less, maybe 60GB?

 

Data never sleeps
mariopellegrini
Pyrite | Level 9

Using
proc options option=sortsize;
runs;
Is it ok to set a value just below the one encountered by this procedure?

yabwon
Onyx | Level 15

 

The idea "behind" goes like this:

If you have 8GB for sorting, and 40GB to sort, after splitting that 40GB into 10 parts (4GB each) there is a chance you will be able to sort each part in memory, without I/O swapping.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mariopellegrini
Pyrite | Level 9

Bart, I didn't understand much. Could you make a practical example with sas code explaining the usage of these 2 sas options? Thank you

yabwon
Onyx | Level 15

First step would be to see how much RAM for sorting your session has:

proc options option=sortsize;
run;

this will tell you how much RAM you have for sorting.

 

The firstobs= and obs= indicates from which observation to which observation SAS should read the dataset e.g.,

data part1;
  set sashelp.class(firstobs=1 obs=10);
run;

data part2;
  set sashelp.class(firstobs=11 obs=19);
run;

So if the data are ~40GB in size (and let assume 40 millions obs.), and you have ~8GB ram for sorting, then (taking into account safe margin) if you split your data into 10 even parts (4 million obs each) you will get ~4GB size by each, and you can sort each part in memory. Then when you create 10 sorted "small" datasets you can use SET statement with BY for interleaving.

proc sort data=sashelp.class(firstobs=1 obs=10) out=part1;
  by age;
run;

proc sort data=sashelp.class(firstobs=11 obs=19) out=part2;
  by age;
run;

data sorted;
  set part part2;
  by age;
run;

hope it helps

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mariopellegrini
Pyrite | Level 9

Ok, the example is clear. But is this approach a substitute for increasing the SORTSIZE? Which is better between the 2 things?

Thank you

yabwon
Onyx | Level 15

I wouldn't say "substitute", it's just different approach.

In setups I was working we didn't have authorisation to modify session setup (e.g sortsize) so we had to search for different tools.

You need to try and compare results. If you are "the only one" user of the machine you could go with increasing sortsize, but if there are other users it may not be possible without "destabilising" their sessions... go by Maxim 4 (maxims)

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mariopellegrini
Pyrite | Level 9

from the result of proc options option=sortsize;
runs; I get the value of SORTSIZE=32212254720, but I don't understand in which unit of measure this number is expressed, could you tell me?

yabwon
Onyx | Level 15

32212254720B = 31457280KB = 30720MB = 30GB nice setup 🙂



_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 29 replies
  • 3696 views
  • 7 likes
  • 11 in conversation