- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Take the following code:
proc sql;
create table want as
select age,sum(height) as total_height
from sashelp.class
group by age
;
quit;
It seems to always create a table in age order even though their isn't an order by statement. Is that documented anywhere?
Thanks in advance,
Art
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used. So, for portable SQL, you should have the order by statement if you need the output sorted. If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output. It may produce a hash table and then walk the hash, effectively performing a hashsort. If you want provably correct code, use the order by.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Art,
I think order by is not necessary.
example:
data have;
input id $ age;
cards;
e 30
d 20
e 30
d 25
a 10
e 35
d 25
a 15
;
proc sql;
create table want as
select id,age,sum(age) as total_age
from have
group by id,age
;
quit;
proc print;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But is that documented anywhere? If not, I'm concerned that it won't ALWAYS behave that way. I hate to depend on something that isn't guaranteed to always perform a certain way. The documentation says to use an 'order by' statement to insure expected order.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ORDER BY is not necessary.
If there is no ORDER BY statement and your data come out ordered, then this is because this is the way they exist in your dataset.
For documentation, look at the SAS ORDER= statement documentation.
To test this for yourself, reorder your dataset and then rerun your PROC SQL statement.
BUT to do that, you need to be careful about how the data is structured after it is grouped. Your dataset is not ordered by age -- but is is AFTER it has been grouped by ID.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
bbenbaruch: your comments don't match. You say the statement isn't needed, but then go on to explain why it is needed.
Anyhow, I currently agree with haikuo that it isn't documented.
Regardless, to test your comments, I ran a rather hastily written macro that ran 10000 iterations of randomly reordering a simplified version of Linlin's example:
data have;
input age;
cards;
30
20
30
25
10
35
25
15
;
%macro test(iterations);
%let counter=0;
proc printto log="c:\art\junk";
run;
%do i=1 %to &iterations;
data have;
set have;
id=ranuni(0);
run;
proc sort data=have;
by id;
run;
proc sql;
create table want as
select age,sum(age) as total_age
from have
group by age
;
quit;
data test;
set want end=eof;
test=lag(age);
if test gt age then do;
counter+1;
end;
run;
proc sql noprint;
select max(counter)+&counter into :counter
from test
;
quit;
%end;
proc printto;
run;
%put &counter.;
%mend test;
%test(10000)
The results were in age order for each of the 10000 trials.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Art,
I would have to echo with your concern after done some searching. Please see the official link for SAS proc SQL group by:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473678.htm
It is a length page, so here is the quote:
"
- The data does not have to be sorted in the order of the group-by values because PROC SQL handles sorting automatically. You can use the ORDER BY clause to specify the order in which rows are displayed in the result table."
So it seems to me that proc SQL will do some sort of 'sorting' during the 'grouping' process, it makes sense that it is easier to group on sorted data (I have seen that group by clause works faster on sorted data), so sorting may just be a by-product of grouping, however, the what kind of sort is not guaranteed. And this impression can be further backed up by the following link if SAS using the same SQL engine as the Oracle/PLSQL:
http://www.techonthenet.com/oracle/questions/group_order.php
And the bottomeline is: we can't find any officail doc! Maybe this is why.
Regards,
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Art:
It may not be in the SAS docs so you would have to put it in to avoid relying on undocumented behavior.
OTOH, it's almost certainly costless (in terms of computing resources) because it is probably true (as per our shared belief) that GROUP BY has activated the sort.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used. So, for portable SQL, you should have the order by statement if you need the output sorted. If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output. It may produce a hash table and then walk the hash, effectively performing a hashsort. If you want provably correct code, use the order by.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Arthur,
I think SQL will order itself defaultly. That is the weakness .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is a characteristic of SQL that if you don't specify ORDER BY, the result of GROUP BY is not defined. It's interesting that from your experiment it appears that SAS always returns it in order.
When I have used other SQL-based DBMS products, they don't order the results if you don't specify ORDER BY (I know this, as I frequently forget it, and get my results in a random order! Sigh...)
Tom