I am using SAS 9.4. My question is, "When is it necessary to use an ORDER BY clause in PROC SQL?"
My question was inspired mostly by general statements about the PROC SQL ORDER BY clause in the SAS 9.4 documentation like this:
*This sort is not necessary here, but the data I worked with had been sorted like so.;
proc sort data=sashelp.bmt out=sorted;
by Group Status;
run;
proc sql;
create table pre_want as
select Group, Status, count(T) as freq_for_T
from sorted
group by Group, Status
order by Group, Status;
create table want as
select *, count(Status) as freq_for_Status
from pre_want
group by Group
order by Group, Status;
quit;
Group | Status | freq_for_T | freq_for_Status |
ALL | 0 | 14 | 2 |
ALL | 1 | 24 | 2 |
AML-High Risk | 0 | 11 | 2 |
AML-High Risk | 1 | 34 | 2 |
AML-Low Risk | 0 | 29 | 2 |
AML-Low Risk | 1 | 25 | 2 |
I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted. So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders.
The SQL language does not guarantee the order of the output if you don't include an ORDER BY clause.
In general SAS's implementation will produce ordered results when you use a GROUP BY statement.
But if SAS is pushing the query into a remote database then that database might not return the observations in order without an ORDER BY clause.
Ok, thank you both for the quick replies. I am trying to reconcile your comments with this statement here from the SAS User's guide:
"The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause."
Which of these two interpretations is correct:
1.) "The order of the output rows that are returned is NOT guaranteed for the columns that are not specified in the ORDER BY clause, but is ONLY guaranteed for the columns that are specified in the ORDER BY clause." (This is definitely not what I want to happen, ever!)
2.) "The ORDER BY clause functions in the same manner as the BY statement in PROC SORT." (This IS what I want.)
Look at option EQUALS in the sort procedure. What the documentation says about proc SQL ORDER BY clause is that the sorting is done (sometimes) with option NOEQUALS.
I think they are trying to say that if the ORDER BY variables do not uniquely identify each row then you cannot count on what order the "duplicates" well be sorted. So if you have data that is uniquely identified by ID and DATE and you order by ID only then the order of the multiple dates within a single value of ID is not determined. You could run the same query twice and get different orders.
Very well, I believe that you and PGStats have solved it. Everything else I have read or seen about PROC SQL (or SQL in general) aligns with your interpretation. I appreciate the insights that you two shared.
If you need your data to be ordered, then ask for it with an ORDER BY clause. That makes your intentions explicit and there is no cost if it is already ordered.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.