What is the difference between Class and by statement in PROC MEANS?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 88
Accepted Solution

What is the difference between Class and by statement in PROC MEANS?

Dear all,

I would like to know the difference between the CLASS statement and BY statement in PROC MEANS, could anyone clarify for me?


Accepted Solutions
Solution
‎05-02-2013 09:45 PM
Respected Advisor
Posts: 4,930

Re: What is the difference between Class and by statement in PROC MEANS?

CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.

Practically, this means that:


  • The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.

  • Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.

  • The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.

  • The MEANS procedure is more efficient at treating BY groups than CLASS groups.

Check out the example

Base SAS(R) 9.3 Procedures Guide, Second Edition

I find it quite informative.

PG

PG

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

Re: What is the difference between Class and by statement in PROC MEANS?

I would look in the online documentation.

Frequent Contributor
Posts: 88

Re: What is the difference between Class and by statement in PROC MEANS?

Posted in reply to data_null__

I checked in the SAS guide, but still can't get an easily answer to this question. Hope you can explain in an easy-understand way. Thanks

Solution
‎05-02-2013 09:45 PM
Respected Advisor
Posts: 4,930

Re: What is the difference between Class and by statement in PROC MEANS?

CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.

Practically, this means that:


  • The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.

  • Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.

  • The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.

  • The MEANS procedure is more efficient at treating BY groups than CLASS groups.

Check out the example

Base SAS(R) 9.3 Procedures Guide, Second Edition

I find it quite informative.

PG

PG
Super Contributor
Posts: 644

Re: What is the difference between Class and by statement in PROC MEANS?

The class statement is fantastic for getting subtotals as well as totals in an output dataset (proc summary == proc means noprint) - using the Descending option puts the grand total last.

Saves having to do a sort first

BUT - big but - all the class running totals have to be held in memory (at least in earlier versions) which can be a problem if you have multiple class variables with many levels each.  If you run out of memory you may have to resort to sort.

Richard

SAS Super FREQ
Posts: 3,755

Re: What is the difference between Class and by statement in PROC MEANS?

As PG points out, "The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table."

To me, this is the biggest difference.  If I want to compare statistics across categories (such as male/female), I will use the CLASS statement when I want one table that contains all of the statistics (which is most of the time).

Community Manager
Posts: 2,955

Re: What is the difference between Class and by statement in PROC MEANS?

As said, when comparing stats I usually go for the CLASS statement.  As others have pointed out, BY requires the input data to be sorted.  However, if your input data is coming from a database library (and not a SAS data set), PROC MEANS is smart enough to ask for the data in "sorted" order before calculating the stats, so a separate PROC SORT step is not necessary.  In fact, with SAS 9.2 and later, PROC MEANS will actually get the database to calculate the basic stats if it can: Avg, Sum, Min, Max and so on -- thus saving lots of I/O within your SAS session.

Other "database-aware" procs include FREQ, RANK, SUMMARY/MEANS, REPORT, TABULATE.  See this SAS Note for more information.

Chris

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 8537 views
  • 10 likes
  • 6 in conversation