BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jack2012
Obsidian | Level 7

Dear all,

I would like to know the difference between the CLASS statement and BY statement in PROC MEANS, could anyone clarify for me?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.

Practically, this means that:


  • The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.

  • Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.

  • The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.

  • The MEANS procedure is more efficient at treating BY groups than CLASS groups.

Check out the example

Base SAS(R) 9.3 Procedures Guide, Second Edition

I find it quite informative.

PG

PG

View solution in original post

6 REPLIES 6
data_null__
Jade | Level 19

I would look in the online documentation.

Jack2012
Obsidian | Level 7

I checked in the SAS guide, but still can't get an easily answer to this question. Hope you can explain in an easy-understand way. Thanks

PGStats
Opal | Level 21

CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.

Practically, this means that:


  • The input dataset must be sorted by the BY variables. It doesn't have to be sorted by the CLASS variables.

  • Without the NWAY option in the PROC MEANS statement, the CLASS statement will calculate summaries for each class variable separately as well as for each possible combination of class variables. The BY statement only provides summaries for the groups created by the combination of all BY variables.

  • The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table.

  • The MEANS procedure is more efficient at treating BY groups than CLASS groups.

Check out the example

Base SAS(R) 9.3 Procedures Guide, Second Edition

I find it quite informative.

PG

PG
RichardinOz
Quartz | Level 8

The class statement is fantastic for getting subtotals as well as totals in an output dataset (proc summary == proc means noprint) - using the Descending option puts the grand total last.

Saves having to do a sort first

BUT - big but - all the class running totals have to be held in memory (at least in earlier versions) which can be a problem if you have multiple class variables with many levels each.  If you run out of memory you may have to resort to sort.

Richard

Rick_SAS
SAS Super FREQ

As PG points out, "The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table."

To me, this is the biggest difference.  If I want to compare statistics across categories (such as male/female), I will use the CLASS statement when I want one table that contains all of the statistics (which is most of the time).

ChrisHemedinger
Community Manager

As said, when comparing stats I usually go for the CLASS statement.  As others have pointed out, BY requires the input data to be sorted.  However, if your input data is coming from a database library (and not a SAS data set), PROC MEANS is smart enough to ask for the data in "sorted" order before calculating the stats, so a separate PROC SORT step is not necessary.  In fact, with SAS 9.2 and later, PROC MEANS will actually get the database to calculate the basic stats if it can: Avg, Sum, Min, Max and so on -- thus saving lots of I/O within your SAS session.

Other "database-aware" procs include FREQ, RANK, SUMMARY/MEANS, REPORT, TABULATE.  See this SAS Note for more information.

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 17720 views
  • 12 likes
  • 6 in conversation