Help using Base SAS procedures

Difficulty with PROC SQL documentation--ORDER BY clause

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Difficulty with PROC SQL documentation--ORDER BY clause

I am using SAS 9.4. My question is, "When is it necessary to use an ORDER BY clause in PROC SQL?"

 

My question was inspired mostly by general statements about the PROC SQL ORDER BY clause in the SAS 9.4 documentation like this:

 

"If an ORDER BY clause is omitted, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed—even if an index is present. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of PROC SQL, the default collating sequence of SAS, and your operating environment."
 
The documentation elaborates on these ideas:
 
Spoiler
The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause.
Note: The ORDER BY clause does not guarantee that the order of the rows generated is deterministic. The ANSI standard for SQL allows the SQL implementation to specify whether the ORDER BY clause is stable or unstable. If the joint combination of values that is referenced in an ORDER BY clause for a query are unique in all of the rows that are being ordered, then the order of rows that is generated by ORDER BY is always deterministic. However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable.
 
I don't understand what the documentation means and have not found any relevant articles. If my question is too general, then I have an example of PROC SQL code below that was very similar to something I did at my job recently. To me, it does not seem that the cautions in the documentation about order of rows not being deterministic applies to either of the two datasets I made using PROC SQL here, as the two variables in the ORDER BY clauses uniquely identify observations in the resulting datasets:
 
*This sort is not necessary here, but the data I worked with had been sorted like so.;
proc sort data=sashelp.bmt out=sorted;
	by Group Status;
run;

proc sql;
	create table pre_want as
	select Group, Status, count(T) as freq_for_T
	from sorted
	group by Group, Status
	order by Group, Status;
	create table want as
	select *, count(Status) as freq_for_Status
	from pre_want
	group by Group
	order by Group, Status;
quit;
 
The dataset "want" is here (with variable labels removed):
 

Group

Status

freq_for_T

freq_for_Status

ALL

0

14

2

ALL

1

24

2

AML-High Risk

0

11

2

AML-High Risk

1

34

2

AML-Low Risk

0

29

2

AML-Low Risk

1

25

2

 
I am confused by the PROC SQL documentation and because I have seen many, many examples of users using PROC SQL without an ORDER BY clause (regardless of whether a GROUP BY clause was used). When is it necessary to use an ORDER BY clause with PROC SQL?

Accepted Solutions
Solution
‎04-27-2018 09:23 AM
Super User
Super User
Posts: 8,114

Re: Difficulty with PROC SQL documentation--ORDER BY clause

I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted.  So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders. 

 

View solution in original post


All Replies
Super User
Super User
Posts: 8,114

Re: Difficulty with PROC SQL documentation--ORDER BY clause

The SQL language does not guarantee the order of the output if you don't include an ORDER BY clause.

In general SAS's implementation will produce ordered results when you use a GROUP BY statement. 

But if SAS is pushing the query into a remote database then that database might not return the observations in order without an ORDER BY clause.

New Contributor
Posts: 3

Re: Difficulty with PROC SQL documentation--ORDER BY clause

Ok, thank you both for the quick replies. I am trying to reconcile your comments with this statement here from the SAS User's guide:

 

"The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause."

 

Which of these two interpretations is correct:

 

1.) "The order of the output rows that are returned is NOT guaranteed for the columns that are not specified in the ORDER BY clause, but is ONLY guaranteed for the columns that are specified in the ORDER BY clause." (This is definitely not what I want to happen, ever!)

 

2.) "The ORDER BY clause functions in the same manner as the BY statement in PROC SORT." (This IS what I want.)

Esteemed Advisor
Posts: 5,530

Re: Difficulty with PROC SQL documentation--ORDER BY clause

Look at option EQUALS in the sort procedure. What the documentation says about proc SQL ORDER BY clause is that the sorting is done (sometimes) with option NOEQUALS.

PG
Solution
‎04-27-2018 09:23 AM
Super User
Super User
Posts: 8,114

Re: Difficulty with PROC SQL documentation--ORDER BY clause

I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted.  So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders. 

 

New Contributor
Posts: 3

Re: Difficulty with PROC SQL documentation--ORDER BY clause

Very well, I believe that you and PGStats have solved it. Everything else I have read or seen about PROC SQL (or SQL in general) aligns with your interpretation. I appreciate the insights that you two shared.

Esteemed Advisor
Posts: 5,530

Re: Difficulty with PROC SQL documentation--ORDER BY clause

If you need your data to be ordered, then ask for it with an ORDER BY clause. That makes your intentions explicit and there is no cost if it is already ordered.

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 204 views
  • 5 likes
  • 3 in conversation