This is part 2 of my previous article on appending data. Now we’ll focus on sorting data.
PROC SORT is probably one of the utility procedures the most used by SAS programmers. Who has not complained 😊 about having to sort the data prior to run a task that requires a BY statement? I don’t need a poll to find out.
Sorting data in the SAS world (and using PROC SORT) is helpful and sometimes mandatory in the following circumstances (list might not be complete):
What about CAS?
2 main messages.
The order of rows in a CAS table is not predictable. Thus, you cannot create a CAS table in a sorted way. But you can partition it (distribute the data across the CAS workers according to by variables using the GROUPBY option). Then, within partitions, you can sort the data according to additional variables (using the ORDERBY option) which could help in some situations.
If you try to sort a CAS table and store it in CAS, you get the following error message:
73 proc sort data=casdm.bigprdsale out=casdm.bigprdsale_sorted ;
74 by actual ;
75 run ;
ERROR: Operation is not supported when both the source and destination include CAS libname references.
ERROR: An error has occurred.
Similar message when you use the FedSQL CREATE TABLE … ORDER BY statement.
Does that mean that CAS is not able to sort data? No. Of course CAS can sort data. If you output the result of your sort procedure in SAS, then you have:
73 proc sort data=casdm.bigprdsale out=work.bigprdsale_sorted ;
74 by actual ;
75 run ;
NOTE: Sorting was performed by the data source.
NOTE: There were 1440 observations read from the data set CASDM.BIGPRDSALE.
NOTE: The data set WORK.BIGPRDSALE_SORTED has 1440 observations and 10 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.13 seconds
cpu time 0.02 seconds
And you can see that CAS sorted the data.
But do you really need a sorted table?
With Viya and CAS, you no longer need to sort input tables before running a task with a BY statement. CAS will process the by-groups automatically on the fly for you.
So, you can directly write and run your magical DATA Step MERGE BY without having to care about sorting. No more PROC SORT before your DATA Step. Behind the scenes, some partitioning will occur on the fly to organize the data appropriately in order to handle the by-groups. One can anticipate this by pre-partitioning the data before.
What if you were using PROC SORT to detect top/bottom values?
You can use the fetch CAS action with the sortedBy and the to options to identify top N (or bottom N by reversing the order) records according to the order of one or several variables. Unfortunately, the result is not a CAS table (fetch actually means to pull the result on the client) so use this with caution and reasonable result sets.
The following example identifies the top 3 records with the highest actual value, fetches them on the client before moving them back to CAS using the saveresult CASL statement. This is not ideal but that should work fine in most cases.
proc cas ;
table.fetch result=r / table={caslib="dm" name="bigprdsale"} to=3 sortby={{name="actual" order="descending"}} ;
sessionProp.setSessOpt / caslib="dm" ;
saveresult r casout="top3" ;
run ;
quit ;
What if you were using PROC SORT to de-duplicate records?
There are other alternatives that we’ll discuss in the next part of the blog.
Thanks for reading and stay tuned for my next article on de-duplicating data.
What can I do about the following error if I want to data step merge without sorting?
ERROR: Input data sets cannot be combined because they have different
collating sequences (SORTSEQ).
Hello.
Do you face this problem with data stored in CAS?
No, one was in hadoop and one was in cas. That was the problem. Thanks.
I am trying to rewrite this code to run in cas, but I don't know what to do about the sort descending. Could you help?
proc sort data=mydata;
by a b c descending d e f descending g h;
quit;
data mydata2;
set mydata;
by a b c;
if b ne c then do;
if first.c ne 1 then delete;
end;
run;
Hello,
DESCENDING is now supported in SAS Viya 3.5 with some caveats (https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lestmtsref&docsetTarge...).
You can also check Steven Sober's blogs on this topic:
One of the ideas is to create a view with the opposite value and sort ascending on it.
Nicolas.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.