BookmarkSubscribeRSS Feed
AndrewHowell
Moderator

PROC SQL - if there is no "ORDER BY", does PROC SQL retain row order when using SET & UNION operators?

 

e.g,

proc sql;

    insert into table1
        select a, b, c, d from table2;

    create table table1 as
        select * from table1
        union all
        select * from table2;

quit;

(Yes, if it were me I'd put an "ORDER BY" clause, but we're trying to set up regression testing for some existing code.)

 

I'd assumed row order would be retained, but then read the following:

 

SUGI 31: 242-31 "SQL SET OPERATORS: SO HANDY VENN YOU NEED THEM", Howard Schreier, Howles Informatics.

"Since there is no ORDER BY clause in the SQL code, the SQL processor is not obligated to deliver its results in any particular order. The order we observe is basically a consequence of internal optimization; the processor is avoiding unnecessary work by simply preserving the order in which it encounters the rows."

 

I've searched for clarifying documentation but cannot find any.

 

Is there definitive documentation somewhere clarifying returned row order if there is no "ORDER BY" clause?

 

Thanks,

Andrew.

 

4 REPLIES 4
ballardw
Super User

The only thing I have found in documentation is no order is guaranteed when no Order BY is used. The most common order is the sort order of variables that are listed on the outermost select statement. But if you have a Select * all bets are off.

 

If you are at all worried about order then specify it.

 

If you need to UNION sets AND maintain some order then perhaps you should look into the DATA step Set. Which will preserve the order of each set from first to last in order of appearance on the set statement.

 

data newtable;
   set table1
       table2
   ;
run;

Less code and won't require the CORRESPONDING instruction to keep things aligned.

 

I don't think I would ever knowingly use the same dataset name as the result and the source of data in an SQL query. The warning below is why.

2626  proc sql;
2627     create table have as
2628     select *
2629     from have
2630     order by vehicle;
WARNING: This CREATE TABLE statement recursively references the target table. A consequence of
         this is a possible data integrity problem.
NOTE: Table USER.HAVE created, with 7 rows and 5 columns.

263
novinosrin
Tourmaline | Level 20

Hi @AndrewHowell  The very 3 letter abbreviation SQL for structured query language is synonymous with a name of "someone" by the Ben Forta. Some even go as far to believe almighty perhaps created EF Codd to relate and normalise tables and then Ben forta to play with it.

 

The author of the best-selling SQL book of all time writes, "the retrieved data  from underlying tables  is not displayed in a mere random order. If unsorted, data will typically be displayed in the order in which it appears in the underlying tables. This could be the order in which the data was added to the tables initially. However, if data was subsequently updated or deleted, the order will be affected by how the DBMS reuses reclaimed storage space. The end result is that you cannot (and should not) rely on the sort order if you do not explicitly control it. Relational database design theory states that the sequence of retrieved data cannot be assumed to have significance if ordering was not explicitly specified"

 

He explains the above applies to results from independent SELECT clauses that uses all kinds of SET operators including UNION ALL.

 

Hmmm(sighing with pride and pleasure in mentioning Ben). Have good one. 

Patrick
Opal | Level 21

@AndrewHowell 

The documentation found here clearly states that you won't get a guaranteed order unless using an Order By clause.

 

Unless an ORDER BY clause is included in the SELECT statement, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed, even if an index is present. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of PROC SQL, the default collating sequence of SAS, and your operating environment. Therefore, if you want your result table to appear in a particular order, then use the ORDER BY clause.

 

For the two examples you've posted:

If using the _method option one can see that no sorting is happening for sas7bdat files and this being a single threaded process with sequential read/write one could make the educated guess that here the row order will be maintained. I'd be rather surprised though if such detail is officially documented anywhere and things could also change from SAS version to SAS version.

AndrewHowell
Moderator

Thank all for the responses.

 

To clarify - yes, the new code will be better & tighter, but we are trying to set up some regression testing on the existing (less than stellar) code.

 

Cheers.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1497 views
  • 0 likes
  • 4 in conversation