Hello
I would like to understand more clearly what is the difference between the following procedures regarding the default order .
I am talking in a case when I am using user defined format to group by values.
Proc SQL-As I understand the default is order by Formatted values .Is it true? In order to order by unformatted values FreelanceReinha
PROC FREQ-As I understand the default is order by unformatted values and then no need order statement in this example .Is it true?
Proc Tabulate-As I understand the default is order by unformatted values and then no need order statement in this example .Is it true?
PROC SUMMARY/MEANS- What is the default sort? Why is it needed to add ORDER=DATA ?
Proc REPORT- What is the default sort? Why is it needed to add ORDER=DATA ?
Data have;
input ID age;
cards;
1 15
2 17
3 18
4 21
5 22
6 19
7 19
8 31
9 39
10 48
11 54
12 17
13 18
;
Run;
proc format;
value agefmt
low-18='TILL 18'
18-19='18-19'
19-21='19-21'
21-high='21+'
;
Run;
/***Way1-Proc FREQ***/
/**No need order statement* in this example*/
proc freq data=have;
format age agefmt.;
tables age / nocum MISSING out=want;
run;
/**Way2-proc sql with using trick in order statement**/
PROC SQL;
select put(age,agefmt.) as age ,
count(*) as nr ,
calculated nr/(select count(*) as total_nr from have)*100 as pctN
from have
group by calculated age
;
QUIT;
/**Way3-Proc summary/means**/
/**Using order=data***/
proc means data=have n percent;
format age agefmt.;
class age / order=data;
run;
/**Way4-Proc REPORT**/
/**Using order=data***/
proc report data=have;
column age id n percent;
define age / group format=agefmt. order=data;
define id / noprint;
define percent / computed format=percent7.2;
compute before;
all = n;
endcomp;
compute percent;
percent = n / all;
endcomp;
run;
/*WAY5-Proc tabulate***/
/**No need order statement* in this example*/
proc tabulate data=have;
format age agefmt.;
class age;
tables age,(n pctn);
run;
You can trick SQL to create the desired order by using a two-step formatting approach:
proc format;
value agegrp
low-18 = "1"
18-19 = "2"
19-21 = "3"
21-high = "4"
;
value $agefmt
"1" = 'TILL 18'
"2" = '18-19'
"3" = '19-21'
"4" = '21+'
;
run;
proc sql;
select
put(age,agegrp.) as age format=$agefmt.,
count(*) as nr ,
calculated nr/(select count(*) as total_nr from have)*100 as pctN
from have
group by calculated age
;
quit;
The default sort order is documented per Proc. The default is normally "internal" meaning the values without formats when it's data and sorted by the printed values (formatted) when it's a report.
With SQL don't assume that there is any sort order unless you define explicitly an Order By clause. Sorting will be by internal unformatted value. If you want it by formatted value then assign the format in the Order By clause (see below).
proc format;
value ifmt
1,7=1
other=0
;
run;
data have;
do i=1,5,3,7,6;
k=i;
output;
end;
format k ifmt.;
run;
proc sql;
select *
from have h
order by k
;
select *
from have h
order by put(k,ifmt.)
;
quit;
@Ronein Because not everything is 100% aligned between the Proc's. There was at one time some effort spent by SAS to "streamline" the keywords between Proc's but changing default sort order would certainly have been an issue - so you just have to learn what's the default by Proc or even better always define it explicitly.
Your SQL does not use "formatted values". You create a string and use that in the GROUP BY, so SQL will sort the output according to the collating order. The string variable will have a simple $ format assigned to it (if any), so no real formatting takes place.
SQL orders on raw values. If you (e.g.) have a date variable with a DDMMYY format, it will still be sorted chronologically in the output.
MEANS and FREQ will order by raw value, but group by formatted values. See
proc means data=have n;
format age agefmt.;
class age;
run;
REPORT will order by formatted values, so you need to use ORDER=DATA there.
TABULATE acts like FREQ and MEANS.
Whenever you have such questions, follow Maxim 4. There is no more satisfying knowledge gain than the one you acquire yourself.
@Ronein wrote:
Thanks,
You say that tabulate acts like FREQ and MEANS.
So, why there is order=Data in proc tabulate and there is no in proc freq/proc means/proc summary?
ORDER=DATA is an option in PROC FREQ and PROC MEANS/SUMMARY.
@Ronein wrote:
Thanks,
You say that tabulate acts like FREQ and MEANS.
So, why there is order=Data in proc tabulate and there is no in proc freq/proc means/proc summary?
From documentation for Proc Freq:
Table 3.4: PROC FREQ Statement Options
Option |
Description |
---|---|
Begins the next one-way table on the current page |
|
Names the input data set |
|
Specifies the outline and cell divider characters for crosstabulation tables |
|
Displays the number of levels for all TABLES variables |
|
Suppresses all displayed output |
|
Specifies the order for reporting variable values |
|
Displays one table per page |
From the documentation for the Proc Means statement output control options:
specifies the field width for the statistics.
specifies the number of decimal places for the statistics.
suppresses reporting the total number of observations for each unique combination of the class variables.
suppresses all displayed output.
orders the values of the class variables according to the specified order.
displays the output.
displays the analysis for all requested combinations of class variables.
displays the values of the ID variables.
produces an ODS output object
You can trick SQL to create the desired order by using a two-step formatting approach:
proc format;
value agegrp
low-18 = "1"
18-19 = "2"
19-21 = "3"
21-high = "4"
;
value $agefmt
"1" = 'TILL 18'
"2" = '18-19'
"3" = '19-21'
"4" = '21+'
;
run;
proc sql;
select
put(age,agegrp.) as age format=$agefmt.,
count(*) as nr ,
calculated nr/(select count(*) as total_nr from have)*100 as pctN
from have
group by calculated age
;
quit;
Very nice and usefull solution for people who prefer to work with proc sql
PROC SUMMARY/MEANS- What is the default sort? Why is it needed to add ORDER=DATA ?
Proc REPORT- What is the default sort? Why is it needed to add ORDER=DATA ?
In PROC MEANS/SUMMARY and PROC REPORT and PROC FREQ, you want ORDER=INTERNAL, not ORDER=DATA.
You probably want ORDER=INTERNAL which uses the unformatted value to determine how the categories will sort. You probably don't want ORDER=DATA uses the order that the results appear in the data, so for example if we are talking about months and month 9 is formatted as SEP2021 and this appears in the first row of the data set, then SEP2021 will be the first in the sort order because it appears first in the data.
What is the default? This is clearly stated in the documentation for each PROC where the ORDER= option is described.
@Ronein wrote:
Great! So it was mistake to write order=data unless there is proc sort before...
Proc Sort by the category variable would be required, not Proc Sort by any other variable. But sorting is unnecessary if the next step is PROC FREQ/PROC MEANS/PROC REPORT/PROC SUMMARY, don't waste your time. (Sorting may be necessary for other reasons in your program)
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.