Take the following code:
proc sql;
create table want as
select age,sum(height) as total_height
from sashelp.class
group by age
;
quit;
It seems to always create a table in age order even though their isn't an order by statement. Is that documented anywhere?
Thanks in advance,
Art
The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used. So, for portable SQL, you should have the order by statement if you need the output sorted. If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output. It may produce a hash table and then walk the hash, effectively performing a hashsort. If you want provably correct code, use the order by.
Hi Art,
I think order by is not necessary.
example:
data have;
input id $ age;
cards;
e 30
d 20
e 30
d 25
a 10
e 35
d 25
a 15
;
proc sql;
create table want as
select id,age,sum(age) as total_age
from have
group by id,age
;
quit;
proc print;run;
But is that documented anywhere? If not, I'm concerned that it won't ALWAYS behave that way. I hate to depend on something that isn't guaranteed to always perform a certain way. The documentation says to use an 'order by' statement to insure expected order.
ORDER BY is not necessary.
If there is no ORDER BY statement and your data come out ordered, then this is because this is the way they exist in your dataset.
For documentation, look at the SAS ORDER= statement documentation.
To test this for yourself, reorder your dataset and then rerun your PROC SQL statement.
BUT to do that, you need to be careful about how the data is structured after it is grouped. Your dataset is not ordered by age -- but is is AFTER it has been grouped by ID.
bbenbaruch: your comments don't match. You say the statement isn't needed, but then go on to explain why it is needed.
Anyhow, I currently agree with haikuo that it isn't documented.
Regardless, to test your comments, I ran a rather hastily written macro that ran 10000 iterations of randomly reordering a simplified version of Linlin's example:
data have;
input age;
cards;
30
20
30
25
10
35
25
15
;
%macro test(iterations);
%let counter=0;
proc printto log="c:\art\junk";
run;
%do i=1 %to &iterations;
data have;
set have;
id=ranuni(0);
run;
proc sort data=have;
by id;
run;
proc sql;
create table want as
select age,sum(age) as total_age
from have
group by age
;
quit;
data test;
set want end=eof;
test=lag(age);
if test gt age then do;
counter+1;
end;
run;
proc sql noprint;
select max(counter)+&counter into :counter
from test
;
quit;
%end;
proc printto;
run;
%put &counter.;
%mend test;
%test(10000)
The results were in age order for each of the 10000 trials.
Hi Art,
I would have to echo with your concern after done some searching. Please see the official link for SAS proc SQL group by:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473678.htm
It is a length page, so here is the quote:
"
So it seems to me that proc SQL will do some sort of 'sorting' during the 'grouping' process, it makes sense that it is easier to group on sorted data (I have seen that group by clause works faster on sorted data), so sorting may just be a by-product of grouping, however, the what kind of sort is not guaranteed. And this impression can be further backed up by the following link if SAS using the same SQL engine as the Oracle/PLSQL:
http://www.techonthenet.com/oracle/questions/group_order.php
And the bottomeline is: we can't find any officail doc! Maybe this is why.
Regards,
Haikuo
Art:
It may not be in the SAS docs so you would have to put it in to avoid relying on undocumented behavior.
OTOH, it's almost certainly costless (in terms of computing resources) because it is probably true (as per our shared belief) that GROUP BY has activated the sort.
The SQL documentation -- ANSI not SAS -- specifies that the ordering is not guaranteed unless an order by statement is used. So, for portable SQL, you should have the order by statement if you need the output sorted. If SAS happens to produce an ordered output with a group, then it is a side effect of how it organizes the data in memory and then walks through the output. It may produce a hash table and then walk the hash, effectively performing a hashsort. If you want provably correct code, use the order by.
Arthur,
I think SQL will order itself defaultly. That is the weakness .
It is a characteristic of SQL that if you don't specify ORDER BY, the result of GROUP BY is not defined. It's interesting that from your experiment it appears that SAS always returns it in order.
When I have used other SQL-based DBMS products, they don't order the results if you don't specify ORDER BY (I know this, as I frequently forget it, and get my results in a random order! Sigh...)
Tom
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.