In any survey of which SAS procedures are used the most it’s a pretty fair bet that Proc Sort and Proc SQL will be, if not at the top, then certainly very close to the top of the list. We simply want our data sorted whether it’s for further By Group processing, printing reports or on-screen presentation and normally Proc Sort or Proc SQL (through its Order By clause) will do that quite simply for us. However, there are occasions when a simple ascending or descending sort doesn’t quite fit what we need, particularly if we’re preparing data for printing or on-screen display. This article will present three ways of producing specific custom sorts which don’t follow the standard sort orders.
For our first example, we’ll take a scenario where we want one or more groups of records to appear in a specific order with the remainder in a natural ascending order. We can achieve this by using a Case Statement inside our Order By clause in Proc SQL. A case statement works like an if-then-else statement and can be used to generate values which will then be used in the sort instead of the actual record values.
If, for example, if we are doing an analysis of the SASHELP.SHOES data set and look at the result of the following query we can see that there are ten distinct values in the region field.
proc sql; select distinct region from sashelp.shoes; quit;
If we want to group the two North American regions (United States and Canada) together and have United States first in the table with Canada second and others in their natural ascending order we cannot do it simply by an ascending or descending sort. Instead we can use proc SQL with an Order By clause and include a Case statement as follows
proc sql; create table sortshoes as select * from sashelp.shoes order by case region when "United States" then 0 when "Canada" then 1 else 2 end,
When Proc SQL carries out the sort and it encounters a record where the value of Region is “United States” that value will be treated as a zero, where the value is Canada that will be treated as a one and any other value will be treated as a two. A sort of these replacement values takes place giving the result below
Of course, it’s entirely possible to sort by more than one column by having primary and secondary sort columns specified in your Order By clause. It’s also possible, using a case statement, to vary the secondary column depending on the value found in the primary column.
This time we’ll assume we’re doing an analysis of the SASHELP.CARS data set and we want to sort by the type variable as the primary column and we want SUVs to be further sorted by mpg_city and any other type sorted by invoice. This time we can use the Case statement like so
proc sql; create table sortcars as select make, type, invoice, mpg_city from sashelp.cars order by type, case type when "SUV" then mpg_city else invoice end; quit;
Our data set will then look like this - note that SUV is sorted before Sedan because of the way capitals are sorted and that Hybrids are sorted by Invoice and SUVs sorted by MPG_City
Normally you want to sort by the full value of a column but just occasionally you might want to sort by a substring within the column. For this example, we’ll create a file containing the names of all British Prime Ministers from 1945 onwards
data prime_ministers1; length name $30; input name $; infile datalines; datalines; Attlee,Clement Churchill,Winston Eden,Anthony Macmillan,Harold Douglas-Home,Alec Wilson,Harold Heath,Edward Callaghan,James Thatcher,Margaret Major,John Blair,Tony Brown,Gordon Cameron,David May,Theresa ; run;
You will see that there is only one column of names with surname first, then a comma and forename last. If I want to sort this by forename I can do so by using the scan function. You should note that because one of the names (Alec Douglas-Home) is double-barrelled I have had to use a modifier to ensure the name is split only on the comma and not the hyphen.
proc sql; create table prime_ministers2 as select * from prime_ministers1 order by scan(name,2,","); quit;
After the Proc SQL runs the order is
As I mentioned in the introduction normally straight ascending or descending sorts will give you what you want but just occasionally you will come up against a scenario where they won’t. In these cases, the flexibility afforded by the Proc SQL Order By clause can come to the rescue and turn what seems like a complex problem into something quite straightforward.
If you have any comments or suggestions for other types of sort please leave them in the comments below, I’d love to hear from you.