BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
art297
Opal | Level 21

Take the following code:

proc sql;

  create table want as

    select age,sum(height) as total_height

      from sashelp.class

        group by age

  ;

quit;

It seems to always create a table in age order even though their isn't an order by statement.  Is that documented anywhere?

Thanks in advance,

Art

1 ACCEPTED SOLUTION

Accepted Solutions
Chasville
Calcite | Level 5

The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used.  So, for portable SQL, you should have the order by statement if you need the output sorted.  If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output.  It may produce a hash table and then walk the hash, effectively performing a hashsort.  If you want provably correct code, use the order by.


View solution in original post

9 REPLIES 9
Linlin
Lapis Lazuli | Level 10

Hi Art,

I think order by is not necessary.

example:

data have;

input id $ age;

cards;

e 30

d 20

e 30

d 25

a 10

e 35

d 25

a 15

;

proc sql;

  create table want as

    select id,age,sum(age) as total_age

      from have

        group by id,age

  ;

quit;

proc print;run;

art297
Opal | Level 21

But is that documented anywhere?  If not, I'm concerned that it won't ALWAYS behave that way.  I hate to depend on something that isn't guaranteed to always perform a certain way.  The documentation says to use an 'order by' statement to insure expected order.

bbenbaruch
Quartz | Level 8

ORDER BY is not necessary.

If there is no ORDER BY statement and your data come out ordered, then this is because this is the way they exist in your dataset.

For documentation, look at the SAS ORDER= statement documentation.

To test this for yourself, reorder your dataset and then rerun your PROC SQL statement.

BUT to do that, you need to be careful about how the data is structured after it is grouped.  Your dataset is not ordered by age -- but is is AFTER it has been grouped by ID.

art297
Opal | Level 21

bbenbaruch: your comments don't match.  You say the statement isn't needed, but then go on to explain why it is needed.

Anyhow, I currently agree with haikuo that it isn't documented.

Regardless, to test your comments, I ran a rather hastily written macro that ran 10000 iterations of randomly reordering a simplified version of Linlin's example:

data have;

input age;

cards;

30

20

30

25

10

35

25

15

;

%macro test(iterations);

%let counter=0;

proc printto log="c:\art\junk";

run;

%do i=1 %to &iterations;

data have;

  set have;

  id=ranuni(0);

run;

proc sort data=have;

  by id;

run;

proc sql;

  create table want as

    select age,sum(age) as total_age

      from have

        group by age

  ;

quit;

data test;

  set want end=eof;

  test=lag(age);

  if test gt age then do;

    counter+1;

  end;

run;

proc sql noprint;

  select max(counter)+&counter into :counter

    from test

  ;

quit;

%end;

proc printto;

run;

%put &counter.;

%mend test;

%test(10000)

The results were in age order for each of the 10000 trials.

Haikuo
Onyx | Level 15

Hi Art,

I would have to echo with your concern after done some searching. Please see the official link for SAS proc SQL group by:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473678.htm

It is a length page, so here is the quote:

"

  •   The data does not have to be sorted in the order of the group-by values because PROC SQL handles sorting automatically. You can use the ORDER BY clause to specify the order in which rows are displayed in the result table."

So it seems to me that proc SQL will do some sort of 'sorting' during the 'grouping' process, it makes sense that it is easier to group on sorted data (I have seen that group by clause works faster on sorted data), so sorting may just be a by-product of grouping, however, the what kind of sort is not guaranteed. And this impression can be further backed up by the following link if SAS using the same SQL engine as the Oracle/PLSQL:

http://www.techonthenet.com/oracle/questions/group_order.php

And the bottomeline is: we can't find any officail doc! Maybe this is why.

  

Regards,

Haikuo

mkeintz
PROC Star

Art:

It may not be in the SAS docs so you would have to put it in to avoid relying on undocumented behavior.

OTOH, it's almost certainly costless (in terms of computing resources) because it is probably true (as per our shared belief) that GROUP BY has activated the sort.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Chasville
Calcite | Level 5

The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used.  So, for portable SQL, you should have the order by statement if you need the output sorted.  If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output.  It may produce a hash table and then walk the hash, effectively performing a hashsort.  If you want provably correct code, use the order by.


Ksharp
Super User

Arthur,

I think SQL will order itself defaultly.  That is the weakness .

TomKari
Onyx | Level 15

It is a characteristic of SQL that if you don't specify ORDER BY, the result of GROUP BY is not defined. It's interesting that from your experiment it appears that SAS always returns it in order.

When I have used other SQL-based DBMS products, they don't order the results if you don't specify ORDER BY (I know this, as I frequently forget it, and get my results in a random order! Sigh...)

Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 4433 views
  • 6 likes
  • 8 in conversation