BookmarkSubscribeRSS Feed

How to sort data in SAS

Started ‎08-14-2020 by
Modified ‎08-14-2020 by
Views 31,735

This tutorial shows how to sort SAS data sets using the SORT procedure to sort records in ascending or descending sequence, by one or more variables.  If you are more of a visual learner, we’ve also covered this topic in a video on the SAS Users YouTube channel:

 

 

Getting Started

I love whenever I can mix my work with a personal hobby. In this case, we’ll be combining SAS with photography. I've created a simple data set with metadata based on photographs I’ve taken. 

 

We will read in four total columns: aperture, ISO, focal_length, and country.

 

data photos;
  infile datalines dsd;
  input Aperture ISO Focal_Length Country: $11.;
  datalines;
5,100,95,Budapest
5,250,59,Budapest
5.6,600,132,Budapest
11,100,24,Budapest
8,100,70,Budapest
10,100,70,Budapest
4.1,250,65,USA
4.1,250,.,USA
5.6,400,125,USA
5.6,250,225,USA
5.6,250,250,USA
8,200,30,
8,200,20,New Zealand
8,200,135,New Zealand
.,100,135,New Zealand
8,100,20,New Zealand
8,100,50,New Zealand
run;
  • Aperture controls the amount of light that reaches the image sensor. Smaller numbers mean that more light comes into the camera and less is in focus. We sometimes refer to that as shallow depth of field.
  • ISO measures the sensitivity of the image sensor. Higher numbers allow you to capture darker scenes. This means that higher ISOs are probably photos that I took at dusk or night.
  • The Focal_length of the lens indicates the distance of the subject in the photograph. Smaller, or shorter, focal lengths have a wider angle of view.
  • Country is simply the country where the photo was taken.

 

We’re going to read in this data and run a PROC PRINT on it so we can see how it looks by default and this is what we get:

 

02_DefaultOrder.png

 

The data was read in row by row, and so the first row of data in my code is the first row of data in my new data set.

 

Sorting your data with PROC SORT

Let’s use PROC SORT on this data set called photos and create a new sorted data set

called SORTED. We will first sort by aperture. This sorts the data by numerical order in the aperture column.

 

proc sort data=photos out=sorted;
      by aperture;
run;
03_SortAperture.png

 

By default, the data is sorted by aperture is ascending order. You'll notice that I have a missing value for aperture in the first observation. Any missing value occurs first when sorted because it's considered the smallest possible value.

 

What if we want to sort not only by aperture but also by country?

 

proc sort data=photos out=sorted;
      by country aperture;
run;

Reading from left to right, I'm firstly sorting by country. We call this the primary sort variable. Aperture is my secondary sort variable.

 

04_SortCountryAperture.png

 

The countries are sorted alphabetically from A to Z. Within each country BY group, the data is then sorted by aperture in ascending order. We have a missing value for country in observation 1 and a missing value for aperture in observation 8. Since we sorted by country first, the missing value for country got sorted to the top first.

 

How to sort the data in descending order

What if we want to sort by descending order instead of the default ascending? In front of “country” in our code, we add the DESCENDING keyword:

 

proc sort data=photos out=sorted;
      by descending country aperture;
run;
05_SortCountryDescending.png

 

Country is my primary sort variable and will be sorted in descending order. The countries are sorted alphabetically from Z to A, starting with the USA. Since we did not use the DESCENDING keyword in front of aperture, aperture will still be sorted in ascending order.

 

If we want the data sorted by aperture in descending order as well, we simply add another DESCENDING keyword in front of aperture:

 

proc sort data=photos out=sorted;
      by descending country descending aperture;
run;

The DESCENDING keyword is required in front of every variable that we want to be sorted in descending order. Otherwise, it's going to be ascending by default. There is no ASCENDING keyword for PROC SORT. Typing in “ascending” will result in a syntax error.

 

We can have more than just two sort variables. If we add ISO, we have primary, secondary, and tertiary sort variables:

 

proc sort data=photos out=sorted;
      by descending country descending aperture ISO;
run;
06_Tertiary.png

 

The data is first sorted by country in descending order. Then within each country (like the USA BY group in the image above), the data is then sorted by aperture also in descending order. Within aperture, ISO is sorted in ascending order by default.

 

Sorting data before FIRST.variable and LAST.variable processing

The DATA step in SAS offers a powerful feature that allows you to take special actions when it encounters the first or last observation within a sorted group.  Of course, this first requires that the data records are sorted.  For example, let's pretend that I wanted to create a data set with records of just the longest focal length within each country.  For this, I need the data sorted by Country (ascending by default) and then focal length (descending).  Then, I can use the BY statement and FIRST-dot processing in the DATA step to keep just the first record in each country group:

 

proc sort data=photos out=sorted;
 by country descending focal_length;
run;

data top;
 set sorted;
 by country;
 if first.country then 
   output;
run;

 

Here's the output. Remember I had one record with a missing value for Country:

 

ChrisHemedinger_0-1597411433444.png

Note: this logic does not handle "ties."  For robust handling of ties in your data, check out the RANK procedure.

 

FIRST-dot and LAST-dot processing is a topic that deserves its own tutorial, but you can learn more from this article by @Rick_SAS.

 

Tip: FIRST-dot/LAST-dot processing is a great use case for the DATA step debugger (in SAS Enterprise Guide or SAS Studio with SAS Viya).  You can see exactly how it works with your DATA step logic.

 

ChrisHemedinger_1-1597411584218.png

 

Comparing PROC SORT to PROC SQL and ORDER BY

If you're familiar with SQL, you might notice that the syntax for a sorted result is a little different.  Here's the basic form of an SQL query:

 

SELECT

      select_list

FROM

      table

ORDER BY

      country     DESC,

      aperture    DESC

      ISO         ASC;

 

In PROC SQL, we would first select variables in the SELECT list and place the desired columns from whichever table we’re capturing information from with structured query language.

 

To sort in PROC SQL, we use the ORDER BY clause. Use the keywords after the variable names, instead of before the variable like PROC SORT. The keywords can also be shortened using DESC for descending and ASC for ascending.  If you often switch between PROC SORT and PROC SQL, the placement of the DESCENDING keyword can be tricky to remember.  "SORT before, SQL after"

 

Why you might not want to sort your data

One final thing to consider: is it possible that you don't need to sort your data? Sorting as a procedure is very resource-intensive. I recommend reading this great blog article by Chris Hemedinger that talks about when you might be able to skip sorting and save both time and resources.  Hint: if your data is in a database (not a SAS data set), you probably don't need to use PROC SORT for sorting.

 

Thank you for reading and please leave me any questions you might have.

 

Version history
Last update:
‎08-14-2020 09:53 AM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags