DATA Step, Macro, Functions and more

Is order by needed when group is used?

Accepted Solution Solved
Reply
PROC Star
Posts: 7,474
Accepted Solution

Is order by needed when group is used?

Take the following code:

proc sql;

  create table want as

    select age,sum(height) as total_height

      from sashelp.class

        group by age

  ;

quit;

It seems to always create a table in age order even though their isn't an order by statement.  Is that documented anywhere?

Thanks in advance,

Art


Accepted Solutions
Solution
‎09-13-2012 10:08 PM
Occasional Contributor
Posts: 9

Re: Is order by needed when group is used?

The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used.  So, for portable SQL, you should have the order by statement if you need the output sorted.  If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output.  It may produce a hash table and then walk the hash, effectively performing a hashsort.  If you want provably correct code, use the order by.


View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Is order by needed when group is used?

Hi Art,

I think order by is not necessary.

example:

data have;

input id $ age;

cards;

e 30

d 20

e 30

d 25

a 10

e 35

d 25

a 15

;

proc sql;

  create table want as

    select id,age,sum(age) as total_age

      from have

        group by id,age

  ;

quit;

proc print;run;

PROC Star
Posts: 7,474

Re: Is order by needed when group is used?

But is that documented anywhere?  If not, I'm concerned that it won't ALWAYS behave that way.  I hate to depend on something that isn't guaranteed to always perform a certain way.  The documentation says to use an 'order by' statement to insure expected order.

Occasional Contributor
Posts: 19

Re: Is order by needed when group is used?

ORDER BY is not necessary.

If there is no ORDER BY statement and your data come out ordered, then this is because this is the way they exist in your dataset.

For documentation, look at the SAS ORDER= statement documentation.

To test this for yourself, reorder your dataset and then rerun your PROC SQL statement.

BUT to do that, you need to be careful about how the data is structured after it is grouped.  Your dataset is not ordered by age -- but is is AFTER it has been grouped by ID.

PROC Star
Posts: 7,474

Re: Is order by needed when group is used?

Posted in reply to bbenbaruch

bbenbaruch: your comments don't match.  You say the statement isn't needed, but then go on to explain why it is needed.

Anyhow, I currently agree with haikuo that it isn't documented.

Regardless, to test your comments, I ran a rather hastily written macro that ran 10000 iterations of randomly reordering a simplified version of Linlin's example:

data have;

input age;

cards;

30

20

30

25

10

35

25

15

;

%macro test(iterations);

%let counter=0;

proc printto log="c:\art\junk";

run;

%do i=1 %to &iterations;

data have;

  set have;

  id=ranuni(0);

run;

proc sort data=have;

  by id;

run;

proc sql;

  create table want as

    select age,sum(age) as total_age

      from have

        group by age

  ;

quit;

data test;

  set want end=eof;

  test=lag(age);

  if test gt age then do;

    counter+1;

  end;

run;

proc sql noprint;

  select max(counter)+&counter into :counter

    from test

  ;

quit;

%end;

proc printto;

run;

%put &counter.;

%mend test;

%test(10000)

The results were in age order for each of the 10000 trials.

Respected Advisor
Posts: 3,156

Re: Is order by needed when group is used?

Hi Art,

I would have to echo with your concern after done some searching. Please see the official link for SAS proc SQL group by:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473678.htm

It is a length page, so here is the quote:

"

  •   The data does not have to be sorted in the order of the group-by values because PROC SQL handles sorting automatically. You can use the ORDER BY clause to specify the order in which rows are displayed in the result table."

So it seems to me that proc SQL will do some sort of 'sorting' during the 'grouping' process, it makes sense that it is easier to group on sorted data (I have seen that group by clause works faster on sorted data), so sorting may just be a by-product of grouping, however, the what kind of sort is not guaranteed. And this impression can be further backed up by the following link if SAS using the same SQL engine as the Oracle/PLSQL:

http://www.techonthenet.com/oracle/questions/group_order.php

And the bottomeline is: we can't find any officail doc! Maybe this is why.

  

Regards,

Haikuo

Trusted Advisor
Posts: 1,022

Re: Is order by needed when group is used?

Art:

It may not be in the SAS docs so you would have to put it in to avoid relying on undocumented behavior.

OTOH, it's almost certainly costless (in terms of computing resources) because it is probably true (as per our shared belief) that GROUP BY has activated the sort.

Solution
‎09-13-2012 10:08 PM
Occasional Contributor
Posts: 9

Re: Is order by needed when group is used?

The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used.  So, for portable SQL, you should have the order by statement if you need the output sorted.  If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output.  It may produce a hash table and then walk the hash, effectively performing a hashsort.  If you want provably correct code, use the order by.


Super User
Posts: 10,028

Re: Is order by needed when group is used?

Arthur,

I think SQL will order itself defaultly.  That is the weakness .

PROC Star
Posts: 1,167

Re: Is order by needed when group is used?

It is a characteristic of SQL that if you don't specify ORDER BY, the result of GROUP BY is not defined. It's interesting that from your experiment it appears that SAS always returns it in order.

When I have used other SQL-based DBMS products, they don't order the results if you don't specify ORDER BY (I know this, as I frequently forget it, and get my results in a random order! Sigh...)

Tom

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 618 views
  • 6 likes
  • 8 in conversation