BookmarkSubscribeRSS Feed
1800bigk
Fluorite | Level 6

Say you have a data set with Vars A,B,C,D,E and it is proc sorted by A,B,C then you create subsets (new data sets) from that original data set but never change the original sorted data.  Later on I need my original data set to be sorted by A,B isn't it already sorted by A,B since I sorted by A,B,C originally?  I am taking over a piece of code at work at the data set takes a long time to sort (lots of data) and I noticed it is sorted twice and I was hoping to eliminate the second sort.  Thanks in advance. 

5 REPLIES 5
data_null__
Jade | Level 19

Yes if you sort by A B C it is by definition sorted by A B.  I find that most programmers SORT way too much.  It is important to remember that SORTED does not necessarily mean that data was created by PROC SORT.  For example PROCS that use CLASS statements created summary data that is sorted by the CLASS variables in most cases.  It is good to learn and exploit ways to created or maintain data in sorted order.

DF
Fluorite | Level 6 DF
Fluorite | Level 6

The PRESORTED option added to proc sort in SAS 9.2 could help avoid the heavy sorting costs, but still allow you to be completely certain data is sorted as required.

Obviously in Null's examples these outputs are sorted by definition, but this could help especially if you're using data created outside your own program.

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146878.htm

data_null__
Jade | Level 19

PRESORTED would be a good place to start.  The OP could modify the existing program with the option on the SORTS and measure any performance difference.  It might be enough increase to be "good enough", without having to rewrite everything which would require a good bit of testing.  Depends on the app and level of risk, I reckon.

Ksharp
Super User

If you want to avoid to sort a dataset more than once ,the best way is to create a index for the datasets,then you will never use proc sort to pre -order the dataset before by statement.

And it is a good way to use large table especially.

Ksharp

sasCoders_com
Calcite | Level 5

As others have pointed out, it should already be sorted.

You can easily let SAS know that it's sorted by setting the sortedBy= data set option.  This can be useful when creating subsets of the parent data set that you ABSOLUTELY KNOW ARE STILL IN THE SORT ORDER.  Here is the syntax:

data junk (sortedBy= i x);
  do i = 1 to 10;
    x +1;
    output;
  end;
run;

proc contents data = junk;
run;

Skipping redundant sorts is usually the best way to gain effeciency in code, but do yourself a big favor and comment the hell out of it Smiley Happy

Good luck. -s www.sascoders.com

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 821 views
  • 0 likes
  • 5 in conversation