BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I would like to understand more clearly what is the difference between the following procedures regarding the default order .

I am talking in a case when I am using user defined format  to group by values.

 

Proc SQL-As I understand the default is order by Formatted values .Is it true? In order to order by unformatted values FreelanceReinhard showed a trick (order by age net 'T',age)

 

PROC FREQ-As I understand the default is order by unformatted values and then no need order statement in this example .Is it true?

 

Proc Tabulate-As I understand the default is order by unformatted values and then no need order statement in this example .Is it true?

 

PROC SUMMARY/MEANS- What is the default sort? Why is it needed to add  ORDER=DATA ?

 

Proc REPORT- What is the default sort? Why is it needed to add  ORDER=DATA ?

 

 

 

Data have;
input ID age;
cards;
1 15
2 17
3 18
4 21
5 22
6 19
7 19
8 31
9 39
10 48
11 54
12 17
13 18
;
Run;


proc format;
value agefmt
low-18='TILL 18'
18-19='18-19'
19-21='19-21'
21-high='21+'
;
Run;

/***Way1-Proc FREQ***/
/**No need order statement* in this example*/
proc freq data=have;
format age agefmt.;
tables age / nocum MISSING out=want;
run;

/**Way2-proc sql with using trick in order statement**/
PROC SQL;
	select put(age,agefmt.) as age ,
          count(*) as nr  ,
          calculated nr/(select count(*) as  total_nr from  have)*100 as pctN 
	from  have
	group by  calculated age 
;
QUIT;

/**Way3-Proc summary/means**/
/**Using order=data***/
proc means data=have n percent;
format age agefmt.;
class age / order=data;
run;


/**Way4-Proc REPORT**/
/**Using order=data***/
proc report data=have;
column age id n percent;
define age / group format=agefmt. order=data;
define id / noprint;
define percent / computed format=percent7.2;
compute before;
all = n;
endcomp;
compute percent;
percent = n / all;
endcomp;
run;


/*WAY5-Proc tabulate***/
/**No need order statement* in this example*/
proc tabulate data=have;
format age agefmt.;
class age;
tables age,(n pctn);
run;


 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You can trick SQL to create the desired order by using a two-step formatting approach:

proc format;
value agegrp
  low-18 = "1"
  18-19 = "2"
  19-21 = "3"
  21-high = "4"
;
value $agefmt 
  "1" = 'TILL 18'
  "2" = '18-19'
  "3" = '19-21'
  "4" = '21+'
;
run;

proc sql;
select
  put(age,agegrp.) as age format=$agefmt.,
  count(*) as nr  ,
  calculated nr/(select count(*) as  total_nr from  have)*100 as pctN 
  from  have
  group by  calculated age 
;
quit;

View solution in original post

12 REPLIES 12
Patrick
Opal | Level 21

The default sort order is documented per Proc. The default is normally "internal" meaning the values without formats when it's data and sorted by the printed values (formatted) when it's a report.

 

With SQL don't assume that there is any sort order unless you define explicitly an Order By clause. Sorting will be by internal unformatted value. If you want it by formatted value then assign the format in the Order By clause (see below).

proc format;
  value ifmt
    1,7=1
    other=0
    ;
run;

data have;
  do i=1,5,3,7,6;
    k=i;
    output;
  end;
  format k ifmt.;
run;

proc sql;
  select *
  from have h
  order by k
  ;
  select *
  from have h
  order by put(k,ifmt.)
  ;
quit;

Patrick_0-1634290503739.png

 

Ronein
Meteorite | Level 14
Why in proc summary and proc report need to use order=data whereas in proc tabulate and proc summary no need?(related to the example I showed)
Patrick
Opal | Level 21

@Ronein Because not everything is 100% aligned between the Proc's. There was at one time some effort spent by SAS to "streamline" the keywords between Proc's but changing default sort order would certainly have been an issue - so you just have to learn what's the default by Proc or even better always define it explicitly. 

Kurt_Bremser
Super User

Your SQL does not use "formatted values". You create a string and use that in the GROUP BY, so SQL will sort the output according to the collating order. The string variable will have a simple $ format assigned to it (if any), so no real formatting takes place.

SQL orders on raw values. If you (e.g.) have a date variable with a DDMMYY format, it will still be sorted chronologically in the output.

 

MEANS and FREQ will order by raw value, but group by formatted values. See

proc means data=have n;
format age agefmt.;
class age;
run;

REPORT will order by formatted values, so you need to use ORDER=DATA there.

TABULATE acts like FREQ and MEANS.

 

Whenever you have such questions, follow Maxim 4. There is no more satisfying knowledge gain than the one you acquire yourself.

Ronein
Meteorite | Level 14
Thanks,
You say that tabulate acts like FREQ and MEANS.
So, why there is order=Data in proc tabulate and there is no in proc freq/proc means/proc summary?
PaigeMiller
Diamond | Level 26

@Ronein wrote:
Thanks,
You say that tabulate acts like FREQ and MEANS.
So, why there is order=Data in proc tabulate and there is no in proc freq/proc means/proc summary?

ORDER=DATA is an option in PROC FREQ and PROC MEANS/SUMMARY.

--
Paige Miller
ballardw
Super User

@Ronein wrote:
Thanks,
You say that tabulate acts like FREQ and MEANS.
So, why there is order=Data in proc tabulate and there is no in proc freq/proc means/proc summary?

From documentation for Proc Freq:

Table 3.4: PROC FREQ Statement Options

Option

Description

COMPRESS

Begins the next one-way table on the current page

DATA=

Names the input data set

FORMCHAR=

Specifies the outline and cell divider characters for crosstabulation tables

NLEVELS

Displays the number of levels for all TABLES variables

NOPRINT

Suppresses all displayed output

ORDER=

Specifies the order for reporting variable values

PAGE

Displays one table per page

 

From the documentation for the Proc Means statement output control options:

Control the output

FW=field-width

specifies the field width for the statistics.

MAXDEC=number

specifies the number of decimal places for the statistics.

NONOBS

suppresses reporting the total number of observations for each unique combination of the class variables.

NOPRINT

suppresses all displayed output.

ORDER=DATA | FORMATTED | FREQ | UNFORMATTED

orders the values of the class variables according to the specified order.

PRINT | NOPRINT

displays the output.

PRINTALLTYPES

displays the analysis for all requested combinations of class variables.

PRINTIDVARS

displays the values of the ID variables.

STACKODSOUTPUT

produces an ODS output object

 

 

 

Kurt_Bremser
Super User

You can trick SQL to create the desired order by using a two-step formatting approach:

proc format;
value agegrp
  low-18 = "1"
  18-19 = "2"
  19-21 = "3"
  21-high = "4"
;
value $agefmt 
  "1" = 'TILL 18'
  "2" = '18-19'
  "3" = '19-21'
  "4" = '21+'
;
run;

proc sql;
select
  put(age,agegrp.) as age format=$agefmt.,
  count(*) as nr  ,
  calculated nr/(select count(*) as  total_nr from  have)*100 as pctN 
  from  have
  group by  calculated age 
;
quit;
Ronein
Meteorite | Level 14

Very nice and usefull solution for people who prefer to work with proc sql

PaigeMiller
Diamond | Level 26

PROC SUMMARY/MEANS- What is the default sort? Why is it needed to add  ORDER=DATA ?

 

Proc REPORT- What is the default sort? Why is it needed to add  ORDER=DATA ?

 

In PROC MEANS/SUMMARY and PROC REPORT and PROC FREQ, you want ORDER=INTERNAL, not ORDER=DATA.

 

You probably want ORDER=INTERNAL which uses the unformatted value to determine how the categories will sort. You probably don't want ORDER=DATA uses the order that the results appear in the data, so for example if we are talking about months and month 9 is formatted as SEP2021 and this appears in the first row of the data set, then SEP2021 will be the first in the sort order because it appears first in the data.

 

What is the default? This is clearly stated in the documentation for each PROC where the ORDER= option is described.

--
Paige Miller
Ronein
Meteorite | Level 14
Great! So it was mistake to write order=data unless there is proc sort before...
PaigeMiller
Diamond | Level 26

@Ronein wrote:
Great! So it was mistake to write order=data unless there is proc sort before...

Proc Sort by the category variable would be required, not Proc Sort by any other variable. But sorting is unnecessary if the next step is PROC FREQ/PROC MEANS/PROC REPORT/PROC SUMMARY, don't waste your time. (Sorting may be necessary for other reasons in your program)

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 937 views
  • 4 likes
  • 5 in conversation