BookmarkSubscribeRSS Feed

How do I sort thee? Let me count the ‘non-coding’ ways: Sorting in SAS Studio

Started 7 hours ago by
Modified 7 hours ago by
Views 63

There are numerous methods of sorting tables in SAS Studio, including using a ‘good old’ PROC SORT! During this post, we explore the importance of sorting SAS tables and evaluate each method, focusing on the point-and-click techniques in.

 

Sorting Data

 

Sorting data can be a helpful or necessary step in exploring your data. Some data processing steps need sorted input data. Other benefits of sorting include being able to visually examine the high or low values and even identifying and removing duplicate rows.

 

01_GR_Pic1.png

 

 

When sorting data, you can specify which columns to sort by.

 

 

02_GR_Pic2.png

 

For example, by adding Subject as the primary column, this data table will be sorted in ascending values of Subject (so, in alphabetical order, A to Z). Notice that the original order has been honored by default.

 

When adding multiple columns, SAS sorts the primary column first and then the secondary column within the primary column. In this example, Subject is sorted in ascending order and then TestScore is sorted in ascending order, within the Subject values.

 

You can specify the order of the rows to view large to small values by using the descending option.

 

03_GR_Pic3.png

 

For example, these rows are primarily sorted by Subject in ascending order and then, within the subject, by descending values of TestScore.

 

When sorting, the default behavior is to keep all records (rows), including duplicates.

 

04_GR_Pic4.png

 

Notice that we have a duplicate value for Name and for Subject in the first two rows and duplicate Name, Subject, and TestScore values in the last two rows.

 

If the direction is to remove duplicate rows, all values for a row are compared to the previous row written to the output table. If an exact match is found, the row is removed. In this example, Louise with a French test result of 92 is removed, because it’s a duplicate row.

 

If the instruction is to remove rows with duplicate values in the sort column, duplicates are removed based on the values of the columns on which the output is sorted. In this example, the second row of Bobby with a French result has been removed and the second row of Louise with a French result has been removed.

 

Techniques for Sorting Data

With multiple methods available, each sorting technique serves a specific initial purpose based on the requirements.  

 

1.      Query Step

The Query step is an extremely versatile tool for data manipulation. You can select, sort, and create new columns, join tables, summarise data and filter both input or summarised data.

 

The CLASS_TESTDUP data contains seven rows, the following GIF demonstrates how to sort the table by Subject and TestScore (in descending order) using a Query step.

 

05_GR_QuerySRT-1.gif

 

The Query step can also be used to deduplicate rows- that is- where all variable values match. This is achieved, as per the following GIF, by selecting (More Options) in the Columns pane of the Query node options and click Select distinct rows only.

 

06_GR_QuerySRTDeDup-1.gif

 

 

2.      Sort Step

The Sort step enables you to order your data by the values of one or more columns. In the following GIF, we sort the data by Subject and Name (both ascending by default). The output shows that rows 1 and 2 (Name of Barbara) and 4 and 5  (name Louise) have duplicate values for the columns designated in the Sort tab, specifically Subject and Name.

 

07_GR_SortSRT-1.gif

 

We can deduplicate the information above, using the Options button in the Sort Node details. Here, I have chosen to Maintain the relative order (this ensures that if multiple rows are detected, they are in the same order in the output table as the input table), and to Remove rows with duplicate values in sort columns. Two rows have been removed, row number 2 and row number 5.

 

08_GR_SortDeDup-1.gif

 

Other sorting scenarios can be accomplished by investigating the different combinations of the options available by clicking GR_SortOptions.pngunder the Options tab of the Sort step.

 

Option Reference
Specify the output order  

Session default1

 

(usually SORTEQUALS)

 

A system option that specifies that rows with identical values in the sorted columns are to retain the same relative positions in the output data set as in the input data set.

Maintain relative order

 

(EQUALS)  

 

A step option that specifies that rows with identical values in the sorted columns are to retain the same relative positions in the output data set as in the input data set.  

Allow variation

 

(NOEQUALS)  

 

Specifies that rows with identical BY variable values are to retain the same relative positions in the output data set as in the input data set.  
Duplicate rows  

 

 

Keep all rows 

 

 

Does not remove any rows.  

Remove rows with duplicate values in the sort columns

 

(NODUPKEY)  

 

Compares values in the requested sorted columns of the current row with the previous for a match. If found, the current row is not written to output data set.  

 

Remove duplicate rows

(NODUPRECS)  

 

 

Compares all values in the current row with the previous row for an exact match. If found, the current row is not written to output data set.  
Force redundant sorting  (FORCE)  

 

The Sort step checks for the sort indicator before it sorts a data set so that data is not sorted again unnecessarily. By default, the Sort step does not sort a data set if the sort information matches the requested sort. You can use this option to override this behavior.  

 

Use tags to sort large data   (TAGSORT)  

 

Useful when there is limited disk space because it stores only the sorted columns and the observation numbers in temporary files and then uses the files to retrieve the records on sort completion.  

 

 

1 To verify the session default, use the OPTIONS procedure.

 

 

3.      Remove Duplicates Step

The Remove Duplicates step removes redundant rows from an input data source. Use this step when reading from a CAS table and generating the output as a CAS table to ensure duplicates are eliminated.

 

In the following GIF, I have used a CAS table as input, chosen to Remove duplicates across all columns, Show detailed log information, Write duplicate rows to the log and output, promote and save as a new CAS table.

 

The log shows that the row with the name of Louise was a duplicate and has been removed.

 

09_GR_DeDupDeDup-1.gif

 

4.      Table Viewer

The table viewer enables you to interactively choose columns, apply filters, sort, and perform other actions on your SAS data.

 

In the following GIF, I have chosen to sort Name in Ascending order, Subject in Descending order and I have hidden TestScore. I have also used the (More Options) to select Show distinct rows > Visible columns to remove duplicate rows for the visible columns only.

 

10_GR_TableVDedup-1.gif

 

Comparison of SAS Table Sorting Techniques

The method you choose to use when sorting tables in SAS Studio depends on your requirements.

 

  • Opt for the Query step if your requirements extend beyond merely sorting or removing duplicates.
  • Storing a CAS table in a sorted order is not necessary as the order of rows in a CAS table is not predictable.
  • Select the Remove Duplicates step to eliminate duplicate rows from a CAS table and generate the output as a CAS table.

11_GR_Pic5.png

 

Other Resources

 

It would be rather remiss of me to not at least mention the powerful PROC SORT procedure! To find out more, please see this article How to sort data in SAS by Andy Ravenna (@AndyRav).

 

A useful paper if you have CAS tables, Parallel Programming with the DATA Step: Next Steps by David Bultman and Jason Secosky, for coding hints and tips.

 

Please feel free to add a comment to ask about other resources you may require!

 

To learn more about Sorting in SAS Studio and many other data managing techniques, please see the new Managing and Querying Data Using Flows in SAS Studio course.

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

And perhaps the table doesn't need to be sorted. Procedures like Proc Report and Tabulate support output appearing in FORMATTED order and use options like PRELOADFMT that with careful format design will display output in custom orders.

Version history
Last update:
7 hours ago
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags