Dear all,
I would like to know the difference between the CLASS statement and BY statement in PROC MEANS, could anyone clarify for me?
CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:
Comparison of the BY and CLASS Statements
Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.
Practically, this means that:
Check out the example
Base SAS(R) 9.3 Procedures Guide, Second Edition
I find it quite informative.
PG
I would look in the online documentation.
I checked in the SAS guide, but still can't get an easily answer to this question. Hope you can explain in an easy-understand way. Thanks
CLASS and BY statements have similar effects but there are some subtle differences. In the documentation it says:
Comparison of the BY and CLASS Statements
Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.
When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.
You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups.
Practically, this means that:
Check out the example
Base SAS(R) 9.3 Procedures Guide, Second Edition
I find it quite informative.
PG
The class statement is fantastic for getting subtotals as well as totals in an output dataset (proc summary == proc means noprint) - using the Descending option puts the grand total last.
Saves having to do a sort first
BUT - big but - all the class running totals have to be held in memory (at least in earlier versions) which can be a problem if you have multiple class variables with many levels each. If you run out of memory you may have to resort to sort.
Richard
As PG points out, "The BY summaries are reported in separate tables (pages) whereas the CLASS summaries appear in a single table."
To me, this is the biggest difference. If I want to compare statistics across categories (such as male/female), I will use the CLASS statement when I want one table that contains all of the statistics (which is most of the time).
As said, when comparing stats I usually go for the CLASS statement. As others have pointed out, BY requires the input data to be sorted. However, if your input data is coming from a database library (and not a SAS data set), PROC MEANS is smart enough to ask for the data in "sorted" order before calculating the stats, so a separate PROC SORT step is not necessary. In fact, with SAS 9.2 and later, PROC MEANS will actually get the database to calculate the basic stats if it can: Avg, Sum, Min, Max and so on -- thus saving lots of I/O within your SAS session.
Other "database-aware" procs include FREQ, RANK, SUMMARY/MEANS, REPORT, TABULATE. See this SAS Note for more information.
Chris
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.