BookmarkSubscribeRSS Feed
JHT
Calcite | Level 5 JHT
Calcite | Level 5

We were having a (admittedly academic) discussion on the differences between using class versus by in a proc means statement. Performance issues aside, are there any differences? Some of colleagues vaguely recalled something about missing values being treated differently, but we couldn't reproduce this. Are there differences (again, performance aside), or did we remember incorrectly? Thanks.

4 REPLIES 4
Rick_SAS
SAS Super FREQ

I don't think there are any numerical/statistical differences. I find the CLASS statement convenient when I want to see all of the output in a single table; the BY group approach puts each BY group statistics on a separate page.  Also, you need to SORT the data to use the BY group, but not to use the CLASS stmt. The BY group approach is more efficient when the data are sorted, and requires less memory. The output data sets also look different for the two approaches.

Astounding
PROC Star

Rick, you have it exactly right.  I just wanted to expound upon one of your points.

Comparing CLASS STATE COUNTY; vs. BY STATE COUNTY;

In the output data set using BY, there is one observation for each STATE/COUNTY combination.

In the output data set using CLASS, you get those same observations, plus:  one observation holding a summary for the entire data set, one set of observations holding a summary for each STATE, and another set of observations holding a summary for each COUNTY.  The variable _TYPE_ in the output data sets tells you what the level of summarization is for that observation.

The printed reports give you summaries at the most detailed level only, even if the output data sets would be different.  And, as Rick noted, the format of the reports would change.

Finally, your colleague's recollection is correct.  Any observation where a CLASS variable is missing will be thrown out of the analysis.  The MISSING option changes that, treating missing values like any other value for a CLASS variable.

SASKiwi
PROC Star

As a general rule where you are dealing with large datasets (> 1GB) and there are many distinct values of the class variables, I have often found SAS will process faster using BY rather than CLASS even with the SORT time added in as well. If your data is already sorted in the right order then the benefit is even greater.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 14443 views
  • 1 like
  • 4 in conversation