I am using SAS 9.4. My question is, "When is it necessary to use an ORDER BY clause in PROC SQL?" My question was inspired mostly by general statements about the PROC SQL ORDER BY clause in the SAS 9.4 documentation like this: "If an ORDER BY clause is omitted, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed—even if an index is present. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of PROC SQL, the default collating sequence of SAS, and your operating environment." The documentation elaborates on these ideas: The order of the output rows that are returned is guaranteed only for columns that are specified in the ORDER BY clause. Note: The ORDER BY clause does not guarantee that the order of the rows generated is deterministic. The ANSI standard for SQL allows the SQL implementation to specify whether the ORDER BY clause is stable or unstable. If the joint combination of values that is referenced in an ORDER BY clause for a query are unique in all of the rows that are being ordered, then the order of rows that is generated by ORDER BY is always deterministic. However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable. I don't understand what the documentation means and have not found any relevant articles. If my question is too general, then I have an example of PROC SQL code below that was very similar to something I did at my job recently. To me, it does not seem that the cautions in the documentation about order of rows not being deterministic applies to either of the two datasets I made using PROC SQL here, as the two variables in the ORDER BY clauses uniquely identify observations in the resulting datasets: *This sort is not necessary here, but the data I worked with had been sorted like so.;
proc sort data=sashelp.bmt out=sorted;
by Group Status;
run;
proc sql;
create table pre_want as
select Group, Status, count(T) as freq_for_T
from sorted
group by Group, Status
order by Group, Status;
create table want as
select *, count(Status) as freq_for_Status
from pre_want
group by Group
order by Group, Status;
quit; The dataset "want" is here (with variable labels removed): Group Status freq_for_T freq_for_Status ALL 0 14 2 ALL 1 24 2 AML-High Risk 0 11 2 AML-High Risk 1 34 2 AML-Low Risk 0 29 2 AML-Low Risk 1 25 2 I am confused by the PROC SQL documentation and because I have seen many, many examples of users using PROC SQL without an ORDER BY clause (regardless of whether a GROUP BY clause was used). When is it necessary to use an ORDER BY clause with PROC SQL?
... View more