We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Custom Sorting with Proc SQL

by Regular Contributor on ‎07-05-2017 07:26 PM (1,769 Views)

Introduction

 

In any survey of which SAS procedures are used the most it’s a pretty fair bet that Proc Sort and Proc SQL will be, if not at the top, then certainly very close to the top of the list. We simply want our data sorted whether it’s for further By Group processing, printing reports or on-screen presentation and normally Proc Sort or Proc SQL (through its Order By clause) will do that quite simply for us. However, there are occasions when a simple ascending or descending sort doesn’t quite fit what we need, particularly if we’re preparing data for printing or on-screen display. This article will present three ways of producing specific custom sorts which don’t follow the standard sort orders.

 

Custom Sorting with the Case Statement

 

For our first example, we’ll take a scenario where we want one or more groups of records to appear in a specific order with the remainder in a natural ascending order. We can achieve this by using a Case Statement inside our Order By clause in Proc SQL. A case statement works like an if-then-else statement and can be used to generate values which will then be used in the sort instead of the actual record values.

 

If, for example, if we are doing an analysis of the SASHELP.SHOES data set and look at the result of the following query we can see that there are ten distinct values in the region field.

 

 

proc sql;
	select distinct region
	from sashelp.shoes;
quit;

 

shoes distinct regions.png

 

If we want to group the two North American regions (United States and Canada) together and have United States first in the table with Canada second and others in their natural ascending order we cannot do it simply by an ascending or descending sort. Instead we can use proc SQL with an Order By clause and include a Case statement as follows

 

proc sql;
	create table sortshoes
	as select *
	from sashelp.shoes
	order by case region when "United States" then 0 when "Canada" then 1 else 2 end,
region;
quit;

 

When Proc SQL carries out the sort and it encounters a record where the value of Region is “United States” that value will be treated as a zero, where the value is Canada that will be treated as a one and any other value will be treated as a two. A sort of these replacement values takes place giving the result below

 

shoes first sort A.png

 

shoes first sort B.png

 

Custom Sorting with Varying Secondary Sort Columns

 

Of course, it’s entirely possible to sort by more than one column by having primary and secondary sort columns specified in your Order By clause. It’s also possible, using a case statement, to vary the secondary column depending on the value found in the primary column.

 

This time we’ll assume we’re doing an analysis of the SASHELP.CARS data set and we want to sort by the type variable as the primary column and we want SUVs to be further sorted by mpg_city and any other type sorted by invoice. This time we can use the Case statement like so

 

 

proc sql;
	create table sortcars
	as select make, type, invoice, mpg_city
	from sashelp.cars
	order by type, case type when "SUV" then mpg_city else invoice end;
quit;

 

Our data set will then look like this - note that SUV is sorted before Sedan because of the way capitals are sorted and that Hybrids are sorted by Invoice and SUVs sorted by MPG_City

 

Cars Sort.png

 

Custom Sorting by a Substring Within a Column

 

Normally you want to sort by the full value of a column but just occasionally you might want to sort by a substring within the column. For this example, we’ll create a file containing the names of all British Prime Ministers from 1945 onwards

 

 

data prime_ministers1;
	length name $30;
	input name $;
	infile datalines;
	datalines;
Attlee,Clement
Churchill,Winston
Eden,Anthony
Macmillan,Harold
Douglas-Home,Alec
Wilson,Harold
Heath,Edward
Callaghan,James
Thatcher,Margaret
Major,John
Blair,Tony
Brown,Gordon
Cameron,David
May,Theresa
;
run;

 

You will see that there is only one column of names with surname first, then a comma and forename last. If I want to sort this by forename I can do so by using the scan function. You should note that because one of the names (Alec Douglas-Home) is double-barrelled I have had to use a modifier to ensure the name is split only on the comma and not the hyphen.

 

 

proc sql;
	create table prime_ministers2
	as select *
	from prime_ministers1
	order by scan(name,2,",");
quit;

 

After the Proc SQL runs the order is

 

Prime Ministers.png

 

Conclusion

 

As I mentioned in the introduction normally straight ascending or descending sorts will give you what you want but just occasionally you will come up against a scenario where they won’t. In these cases, the flexibility afforded by the Proc SQL Order By clause can come to the rescue and turn what seems like a complex problem into something quite straightforward.

 

If you have any comments or suggestions for other types of sort please leave them in the comments below, I’d love to hear from you.

Comments
by Occasional Contributor thesasuser
2 weeks ago

Excellent writeup

by Contributor tomrvincent
2 weeks ago

this fixed the sort bug.

 

order by lower(type)

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.