This tutorial shows how to sort SAS data sets using the SORT procedure to sort records in ascending or descending sequence, by one or more variables. If you are more of a visual learner, we’ve also covered this topic in a video on the SAS Users YouTube channel:
Getting Started
I love whenever I can mix my work with a personal hobby. In this case, we’ll be combining SAS with photography. I've created a simple data set with metadata based on photographs I’ve taken.
We will read in four total columns: aperture, ISO, focal_length, and country.
data photos;
infile datalines dsd;
input Aperture ISO Focal_Length Country: $11.;
datalines;
5,100,95,Budapest
5,250,59,Budapest
5.6,600,132,Budapest
11,100,24,Budapest
8,100,70,Budapest
10,100,70,Budapest
4.1,250,65,USA
4.1,250,.,USA
5.6,400,125,USA
5.6,250,225,USA
5.6,250,250,USA
8,200,30,
8,200,20,New Zealand
8,200,135,New Zealand
.,100,135,New Zealand
8,100,20,New Zealand
8,100,50,New Zealand
run;
Aperture controls the amount of light that reaches the image sensor. Smaller numbers mean that more light comes into the camera and less is in focus. We sometimes refer to that as shallow depth of field.
ISO measures the sensitivity of the image sensor. Higher numbers allow you to capture darker scenes. This means that higher ISOs are probably photos that I took at dusk or night.
The Focal_length of the lens indicates the distance of the subject in the photograph. Smaller, or shorter, focal lengths have a wider angle of view.
Country is simply the country where the photo was taken.
We’re going to read in this data and run a PROC PRINT on it so we can see how it looks by default and this is what we get:
The data was read in row by row, and so the first row of data in my code is the first row of data in my new data set.
Sorting your data with PROC SORT
Let’s use PROC SORT on this data set called photos and create a new sorted data set
called SORTED. We will first sort by aperture. This sorts the data by numerical order in the aperture column.
proc sort data=photos out=sorted;
by aperture;
run;
By default, the data is sorted by aperture is ascending order. You'll notice that I have a missing value for aperture in the first observation. Any missing value occurs first when sorted because it's considered the smallest possible value.
What if we want to sort not only by aperture but also by country?
proc sort data=photos out=sorted;
by country aperture;
run;
Reading from left to right, I'm firstly sorting by country. We call this the primary sort variable. Aperture is my secondary sort variable.
The countries are sorted alphabetically from A to Z. Within each country BY group, the data is then sorted by aperture in ascending order. We have a missing value for country in observation 1 and a missing value for aperture in observation 8. Since we sorted by country first, the missing value for country got sorted to the top first.
How to sort the data in descending order
What if we want to sort by descending order instead of the default ascending? In front of “country” in our code, we add the DESCENDING keyword:
proc sort data=photos out=sorted;
by descending country aperture;
run;
Country is my primary sort variable and will be sorted in descending order. The countries are sorted alphabetically from Z to A, starting with the USA. Since we did not use the DESCENDING keyword in front of aperture, aperture will still be sorted in ascending order.
If we want the data sorted by aperture in descending order as well, we simply add another DESCENDING keyword in front of aperture:
proc sort data=photos out=sorted;
by descending country descending aperture;
run;
The DESCENDING keyword is required in front of every variable that we want to be sorted in descending order. Otherwise, it's going to be ascending by default. There is no ASCENDING keyword for PROC SORT. Typing in “ascending” will result in a syntax error.
We can have more than just two sort variables. If we add ISO, we have primary, secondary, and tertiary sort variables:
proc sort data=photos out=sorted;
by descending country descending aperture ISO;
run;
The data is first sorted by country in descending order. Then within each country (like the USA BY group in the image above), the data is then sorted by aperture also in descending order. Within aperture, ISO is sorted in ascending order by default.
Sorting data before FIRST.variable and LAST.variable processing
The DATA step in SAS offers a powerful feature that allows you to take special actions when it encounters the first or last observation within a sorted group. Of course, this first requires that the data records are sorted. For example, let's pretend that I wanted to create a data set with records of just the longest focal length within each country. For this, I need the data sorted by Country (ascending by default) and then focal length (descending). Then, I can use the BY statement and FIRST-dot processing in the DATA step to keep just the first record in each country group:
proc sort data=photos out=sorted;
by country descending focal_length;
run;
data top;
set sorted;
by country;
if first.country then
output;
run;
Here's the output. Remember I had one record with a missing value for Country:
Note: this logic does not handle "ties." For robust handling of ties in your data, check out the RANK procedure.
FIRST-dot and LAST-dot processing is a topic that deserves its own tutorial, but you can learn more from this article by @Rick_SAS.
Tip: FIRST-dot/LAST-dot processing is a great use case for the DATA step debugger (in SAS Enterprise Guide or SAS Studio with SAS Viya). You can see exactly how it works with your DATA step logic.
Comparing PROC SORT to PROC SQL and ORDER BY
If you're familiar with SQL, you might notice that the syntax for a sorted result is a little different. Here's the basic form of an SQL query:
SELECT
select_list
FROM
table
ORDER BY
country DESC,
aperture DESC
ISO ASC;
In PROC SQL, we would first select variables in the SELECT list and place the desired columns from whichever table we’re capturing information from with structured query language.
To sort in PROC SQL, we use the ORDER BY clause. Use the keywords after the variable names, instead of before the variable like PROC SORT. The keywords can also be shortened using DESC for descending and ASC for ascending. If you often switch between PROC SORT and PROC SQL, the placement of the DESCENDING keyword can be tricky to remember. "SORT before, SQL after"
Why you might not want to sort your data
One final thing to consider: is it possible that you don't need to sort your data? Sorting as a procedure is very resource-intensive. I recommend reading this great blog article by Chris Hemedinger that talks about when you might be able to skip sorting and save both time and resources. Hint: if your data is in a database (not a SAS data set), you probably don't need to use PROC SORT for sorting.
Thank you for reading and please leave me any questions you might have.
... View more