There are numerous methods of sorting tables in SAS Studio, including using a ‘good old’ PROC SORT! During this post, we explore the importance of sorting SAS tables and evaluate each method, focusing on the point-and-click techniques in.
Sorting data can be a helpful or necessary step in exploring your data. Some data processing steps need sorted input data. Other benefits of sorting include being able to visually examine the high or low values and even identifying and removing duplicate rows.
When sorting data, you can specify which columns to sort by.
For example, by adding Subject as the primary column, this data table will be sorted in ascending values of Subject (so, in alphabetical order, A to Z). Notice that the original order has been honored by default.
When adding multiple columns, SAS sorts the primary column first and then the secondary column within the primary column. In this example, Subject is sorted in ascending order and then TestScore is sorted in ascending order, within the Subject values.
You can specify the order of the rows to view large to small values by using the descending option.
For example, these rows are primarily sorted by Subject in ascending order and then, within the subject, by descending values of TestScore.
When sorting, the default behavior is to keep all records (rows), including duplicates.
Notice that we have a duplicate value for Name and for Subject in the first two rows and duplicate Name, Subject, and TestScore values in the last two rows.
If the direction is to remove duplicate rows, all values for a row are compared to the previous row written to the output table. If an exact match is found, the row is removed. In this example, Louise with a French test result of 92 is removed, because it’s a duplicate row.
If the instruction is to remove rows with duplicate values in the sort column, duplicates are removed based on the values of the columns on which the output is sorted. In this example, the second row of Bobby with a French result has been removed and the second row of Louise with a French result has been removed.
With multiple methods available, each sorting technique serves a specific initial purpose based on the requirements.
The Query step is an extremely versatile tool for data manipulation. You can select, sort, and create new columns, join tables, summarise data and filter both input or summarised data.
The CLASS_TESTDUP data contains seven rows, the following GIF demonstrates how to sort the table by Subject and TestScore (in descending order) using a Query step.
The Query step can also be used to deduplicate rows- that is- where all variable values match. This is achieved, as per the following GIF, by selecting (More Options) in the Columns pane of the Query node options and click Select distinct rows only.
The Sort step enables you to order your data by the values of one or more columns. In the following GIF, we sort the data by Subject and Name (both ascending by default). The output shows that rows 1 and 2 (Name of Barbara) and 4 and 5 (name Louise) have duplicate values for the columns designated in the Sort tab, specifically Subject and Name.
We can deduplicate the information above, using the Options button in the Sort Node details. Here, I have chosen to Maintain the relative order (this ensures that if multiple rows are detected, they are in the same order in the output table as the input table), and to Remove rows with duplicate values in sort columns. Two rows have been removed, row number 2 and row number 5.
Other sorting scenarios can be accomplished by investigating the different combinations of the options available by clicking under the Options tab of the Sort step.
Option | Reference | |
Specify the output order |
Session default1
(usually SORTEQUALS)
|
A system option that specifies that rows with identical values in the sorted columns are to retain the same relative positions in the output data set as in the input data set. |
Maintain relative order
(EQUALS)
|
A step option that specifies that rows with identical values in the sorted columns are to retain the same relative positions in the output data set as in the input data set. | |
Allow variation
(NOEQUALS)
|
Specifies that rows with identical BY variable values are to retain the same relative positions in the output data set as in the input data set. | |
Duplicate rows |
Keep all rows
|
Does not remove any rows. |
Remove rows with duplicate values in the sort columns
(NODUPKEY)
|
Compares values in the requested sorted columns of the current row with the previous for a match. If found, the current row is not written to output data set. | |
Remove duplicate rows (NODUPRECS)
|
Compares all values in the current row with the previous row for an exact match. If found, the current row is not written to output data set. | |
Force redundant sorting | (FORCE) |
The Sort step checks for the sort indicator before it sorts a data set so that data is not sorted again unnecessarily. By default, the Sort step does not sort a data set if the sort information matches the requested sort. You can use this option to override this behavior.
|
Use tags to sort large data | (TAGSORT) |
Useful when there is limited disk space because it stores only the sorted columns and the observation numbers in temporary files and then uses the files to retrieve the records on sort completion.
|
1 To verify the session default, use the OPTIONS procedure.
The Remove Duplicates step removes redundant rows from an input data source. Use this step when reading from a CAS table and generating the output as a CAS table to ensure duplicates are eliminated.
In the following GIF, I have used a CAS table as input, chosen to Remove duplicates across all columns, Show detailed log information, Write duplicate rows to the log and output, promote and save as a new CAS table.
The log shows that the row with the name of Louise was a duplicate and has been removed.
The table viewer enables you to interactively choose columns, apply filters, sort, and perform other actions on your SAS data.
In the following GIF, I have chosen to sort Name in Ascending order, Subject in Descending order and I have hidden TestScore. I have also used the (More Options) to select Show distinct rows > Visible columns to remove duplicate rows for the visible columns only.
The method you choose to use when sorting tables in SAS Studio depends on your requirements.
It would be rather remiss of me to not at least mention the powerful PROC SORT procedure! To find out more, please see this article How to sort data in SAS by Andy Ravenna (@AndyRav).
A useful paper if you have CAS tables, Parallel Programming with the DATA Step: Next Steps by David Bultman and Jason Secosky, for coding hints and tips.
Please feel free to add a comment to ask about other resources you may require!
To learn more about Sorting in SAS Studio and many other data managing techniques, please see the new Managing and Querying Data Using Flows in SAS Studio course.
Find more articles from SAS Global Enablement and Learning here.
And perhaps the table doesn't need to be sorted. Procedures like Proc Report and Tabulate support output appearing in FORMATTED order and use options like PRELOADFMT that with careful format design will display output in custom orders.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.