Frequently a SAS programmer needs to both sort and divide a dataset. Typically these tasks are satisfied via a two stage program: (1) divide the dataset, (2) sort the parts. Something like:
data _asia (where=(origin='Asia')) cars_europe (where=(origin='Europe'))
cars_usa (where=(origin='USA')) /view=_asia;
proc sort data=_asia out=cars_asia;
proc sort data=cars_europe;
proc sort data=cars_usa;
Enabling proc sort to do both of these tasks (with syntax such as below) would often significantly reduce input/output activity, and offer a bit of code simplification:
proc sort data=sashelp.cars
cars_usa (where=(origin='USA')) ;
Of course, the multiple output datasets need not be mutually exclusive like the example above.
Mark, two things:
1) I like the idea you have my voice on this one!
2) For "code simplification" case, you can always do one sorting:
proc sort data=sashelp.cars out=cars_sorted;
data cars_asia (where=(origin='Asia'))
All the best
The syntactically correct way would be to not have multiple datasets in the OUT= option, but to allow multiple OUT= options.
Talking about Proc Sort.
It would be also nice to make the KEY statement finally work better than "documented" 😉
Specifies sorting keys and variables. The KEY statement is an alternative to the BY statement. The KEY statement syntax allows for the future possibility of specifying different collation options for each KEY variable. Currently, the only options allowed are ASCENDING and DESCENDING.
Kurt, or maybe:
OUT = ( dataSet1 dataSet2 ... )
@yabwon on second thought, I think the brackets are better than the multiple options.
I had considered using the sort first/split second code
as the foil to contrast to my proposal. But since my primary justification was for efficiency, I chose to start out with the most efficient currently available technique - split first (using a view for one of the splits), and sort second.
As to optimal syntax (multiple out= vs multiple arguments of a single out=), I used the data statement as my model for the out= syntax. I don't have a strong feeling on this issue, as long as it seamlessly supports the use of current dataset name options, especially (of course) where=, keep=, drop=, rename=.
I've forwarded this request to our Product Management team - thanks!
With this following syntax, which looks almost identical to that shown by @yabwon you can accomplish the goal here, which is to avoid the extra IO from reading the sorted file to subsequently split into multiple datasets. This uses the experimental feature called "Output Data Step Views" which are incredibly powerful!
if 0 then set sashelp.cars;
proc sort data=sashelp.cars out=split_sort;
1 data cars_asia(where=(origin='Asia'))
7 if 0 then set sashelp.cars;
8 set split_sort;
NOTE: DATA STEP view saved on file WORK.SPLIT_SORT.
NOTE: A stored DATA STEP view cannot run under a different operating system.
WARNING: The definition of an output DATA step view is an experimental feature in this release and is not intended for use in the
development of production applications.
11 proc sort data=sashelp.cars out=split_sort;
12 by msrp;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.SPLIT_SORT has 428 observations and 15 variables.
NOTE: The data set WORK.CARS_ASIA has 158 observations and 15 variables.
NOTE: The data set WORK.CARS_EUROPE has 123 observations and 15 variables.
NOTE: The data set WORK.CARS_USA has 147 observations and 15 variables.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.