BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yosef
Fluorite | Level 6

I am using SAS 9.4. My question is, "When is it necessary to use an ORDER BY clause in PROC SQL?"

 

My question was inspired mostly by general statements about the PROC SQL ORDER BY clause in the SAS 9.4 documentation like this:

 

"If an ORDER BY clause is omitted, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed—even if an index is present. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of PROC SQL, the default collating sequence of SAS, and your operating environment."
 
The documentation elaborates on these ideas:
 
Spoiler
The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause.
Note: The ORDER BY clause does not guarantee that the order of the rows generated is deterministic. The ANSI standard for SQL allows the SQL implementation to specify whether the ORDER BY clause is stable or unstable. If the joint combination of values that is referenced in an ORDER BY clause for a query are unique in all of the rows that are being ordered, then the order of rows that is generated by ORDER BY is always deterministic. However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable.
 
I don't understand what the documentation means and have not found any relevant articles. If my question is too general, then I have an example of PROC SQL code below that was very similar to something I did at my job recently. To me, it does not seem that the cautions in the documentation about order of rows not being deterministic applies to either of the two datasets I made using PROC SQL here, as the two variables in the ORDER BY clauses uniquely identify observations in the resulting datasets:
 
*This sort is not necessary here, but the data I worked with had been sorted like so.;
proc sort data=sashelp.bmt out=sorted;
	by Group Status;
run;

proc sql;
	create table pre_want as
	select Group, Status, count(T) as freq_for_T
	from sorted
	group by Group, Status
	order by Group, Status;
	create table want as
	select *, count(Status) as freq_for_Status
	from pre_want
	group by Group
	order by Group, Status;
quit;
 
The dataset "want" is here (with variable labels removed):
 

Group

Status

freq_for_T

freq_for_Status

ALL

0

14

2

ALL

1

24

2

AML-High Risk

0

11

2

AML-High Risk

1

34

2

AML-Low Risk

0

29

2

AML-Low Risk

1

25

2

 
I am confused by the PROC SQL documentation and because I have seen many, many examples of users using PROC SQL without an ORDER BY clause (regardless of whether a GROUP BY clause was used). When is it necessary to use an ORDER BY clause with PROC SQL?
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted.  So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders. 

 

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

The SQL language does not guarantee the order of the output if you don't include an ORDER BY clause.

In general SAS's implementation will produce ordered results when you use a GROUP BY statement. 

But if SAS is pushing the query into a remote database then that database might not return the observations in order without an ORDER BY clause.

Yosef
Fluorite | Level 6

Ok, thank you both for the quick replies. I am trying to reconcile your comments with this statement here from the SAS User's guide:

 

"The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause."

 

Which of these two interpretations is correct:

 

1.) "The order of the output rows that are returned is NOT guaranteed for the columns that are not specified in the ORDER BY clause, but is ONLY guaranteed for the columns that are specified in the ORDER BY clause." (This is definitely not what I want to happen, ever!)

 

2.) "The ORDER BY clause functions in the same manner as the BY statement in PROC SORT." (This IS what I want.)

PGStats
Opal | Level 21

Look at option EQUALS in the sort procedure. What the documentation says about proc SQL ORDER BY clause is that the sorting is done (sometimes) with option NOEQUALS.

PG
Tom
Super User Tom
Super User

I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted.  So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders. 

 

Yosef
Fluorite | Level 6

Very well, I believe that you and PGStats have solved it. Everything else I have read or seen about PROC SQL (or SQL in general) aligns with your interpretation. I appreciate the insights that you two shared.

PGStats
Opal | Level 21

If you need your data to be ordered, then ask for it with an ORDER BY clause. That makes your intentions explicit and there is no cost if it is already ordered.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2125 views
  • 5 likes
  • 3 in conversation