This tutorial shows how to sort SAS data sets using the SORT procedure to sort records in ascending or descending sequence, by one or more variables. If you are more of a visual learner, we’ve also covered this topic in a video on the SAS Users YouTube channel:
I love whenever I can mix my work with a personal hobby. In this case, we’ll be combining SAS with photography. I've created a simple data set with metadata based on photographs I’ve taken.
We will read in four total columns: aperture, ISO, focal_length, and country.
data photos; infile datalines dsd; input Aperture ISO Focal_Length Country: $11.; datalines; 5,100,95,Budapest 5,250,59,Budapest 5.6,600,132,Budapest 11,100,24,Budapest 8,100,70,Budapest 10,100,70,Budapest 4.1,250,65,USA 4.1,250,.,USA 5.6,400,125,USA 5.6,250,225,USA 5.6,250,250,USA 8,200,30, 8,200,20,New Zealand 8,200,135,New Zealand .,100,135,New Zealand 8,100,20,New Zealand 8,100,50,New Zealand run;
We’re going to read in this data and run a PROC PRINT on it so we can see how it looks by default and this is what we get:
The data was read in row by row, and so the first row of data in my code is the first row of data in my new data set.
Let’s use PROC SORT on this data set called photos and create a new sorted data set
called SORTED. We will first sort by aperture. This sorts the data by numerical order in the aperture column.
proc sort data=photos out=sorted; by aperture; run;
By default, the data is sorted by aperture is ascending order. You'll notice that I have a missing value for aperture in the first observation. Any missing value occurs first when sorted because it's considered the smallest possible value.
What if we want to sort not only by aperture but also by country?
proc sort data=photos out=sorted; by country aperture; run;
Reading from left to right, I'm firstly sorting by country. We call this the primary sort variable. Aperture is my secondary sort variable.
The countries are sorted alphabetically from A to Z. Within each country BY group, the data is then sorted by aperture in ascending order. We have a missing value for country in observation 1 and a missing value for aperture in observation 8. Since we sorted by country first, the missing value for country got sorted to the top first.
What if we want to sort by descending order instead of the default ascending? In front of “country” in our code, we add the DESCENDING keyword:
proc sort data=photos out=sorted; by descending country aperture; run;
Country is my primary sort variable and will be sorted in descending order. The countries are sorted alphabetically from Z to A, starting with the USA. Since we did not use the DESCENDING keyword in front of aperture, aperture will still be sorted in ascending order.
If we want the data sorted by aperture in descending order as well, we simply add another DESCENDING keyword in front of aperture:
proc sort data=photos out=sorted; by descending country descending aperture; run;
The DESCENDING keyword is required in front of every variable that we want to be sorted in descending order. Otherwise, it's going to be ascending by default. There is no ASCENDING keyword for PROC SORT. Typing in “ascending” will result in a syntax error.
We can have more than just two sort variables. If we add ISO, we have primary, secondary, and tertiary sort variables:
proc sort data=photos out=sorted; by descending country descending aperture ISO; run;
The data is first sorted by country in descending order. Then within each country (like the USA BY group in the image above), the data is then sorted by aperture also in descending order. Within aperture, ISO is sorted in ascending order by default.
The DATA step in SAS offers a powerful feature that allows you to take special actions when it encounters the first or last observation within a sorted group. Of course, this first requires that the data records are sorted. For example, let's pretend that I wanted to create a data set with records of just the longest focal length within each country. For this, I need the data sorted by Country (ascending by default) and then focal length (descending). Then, I can use the BY statement and FIRST-dot processing in the DATA step to keep just the first record in each country group:
proc sort data=photos out=sorted; by country descending focal_length; run; data top; set sorted; by country; if first.country then output; run;
Here's the output. Remember I had one record with a missing value for Country:
Note: this logic does not handle "ties." For robust handling of ties in your data, check out the RANK procedure.
Tip: FIRST-dot/LAST-dot processing is a great use case for the DATA step debugger (in SAS Enterprise Guide or SAS Studio with SAS Viya). You can see exactly how it works with your DATA step logic.
If you're familiar with SQL, you might notice that the syntax for a sorted result is a little different. Here's the basic form of an SQL query:
In PROC SQL, we would first select variables in the SELECT list and place the desired columns from whichever table we’re capturing information from with structured query language.
To sort in PROC SQL, we use the ORDER BY clause. Use the keywords after the variable names, instead of before the variable like PROC SORT. The keywords can also be shortened using DESC for descending and ASC for ascending. If you often switch between PROC SORT and PROC SQL, the placement of the DESCENDING keyword can be tricky to remember. "SORT before, SQL after"
One final thing to consider: is it possible that you don't need to sort your data? Sorting as a procedure is very resource-intensive. I recommend reading this great blog article by Chris Hemedinger that talks about when you might be able to skip sorting and save both time and resources. Hint: if your data is in a database (not a SAS data set), you probably don't need to use PROC SORT for sorting.
Thank you for reading and please leave me any questions you might have.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.