07-20-2011 04:59 PM
Say you have a data set with Vars A,B,C,D,E and it is proc sorted by A,B,C then you create subsets (new data sets) from that original data set but never change the original sorted data. Later on I need my original data set to be sorted by A,B isn't it already sorted by A,B since I sorted by A,B,C originally? I am taking over a piece of code at work at the data set takes a long time to sort (lots of data) and I noticed it is sorted twice and I was hoping to eliminate the second sort. Thanks in advance.
07-20-2011 05:26 PM
Yes if you sort by A B C it is by definition sorted by A B. I find that most programmers SORT way too much. It is important to remember that SORTED does not necessarily mean that data was created by PROC SORT. For example PROCS that use CLASS statements created summary data that is sorted by the CLASS variables in most cases. It is good to learn and exploit ways to created or maintain data in sorted order.
07-21-2011 06:17 AM
Obviously in Null's examples these outputs are sorted by definition, but this could help especially if you're using data created outside your own program.
07-21-2011 10:57 AM
PRESORTED would be a good place to start. The OP could modify the existing program with the option on the SORTS and measure any performance difference. It might be enough increase to be "good enough", without having to rewrite everything which would require a good bit of testing. Depends on the app and level of risk, I reckon.
07-22-2011 12:34 PM
If you want to avoid to sort a dataset more than once ,the best way is to create a index for the datasets,then you will never use proc sort to pre -order the dataset before by statement.
And it is a good way to use large table especially.
07-22-2011 02:02 PM
As others have pointed out, it should already be sorted.
You can easily let SAS know that it's sorted by setting the sortedBy= data set option. This can be useful when creating subsets of the parent data set that you ABSOLUTELY KNOW ARE STILL IN THE SORT ORDER. Here is the syntax:
data junk (sortedBy= i x);
do i = 1 to 10;
proc contents data = junk;
Skipping redundant sorts is usually the best way to gain effeciency in code, but do yourself a big favor and comment the hell out of it
Good luck. -s www.sascoders.com