BookmarkSubscribeRSS Feed

Why do descriptive statistics differ across SAS Viya applications?

Started ‎02-10-2026 by
Modified ‎02-12-2026 by
Views 342
 
LGroves_3-1770755334981.png

 

The Backstory

 

This question came up in a recent conversation with academic colleagues — but the lesson applies far beyond a single dataset or classroom.

 

The original question was simple:

 

Why does the MEGACORP2020 dataset produce different descriptive statistics in SAS Information Catalog and SAS Visual Analytics?

 

More broadly: why can descriptive statistics differ across SAS Viya applications, even when the underlying data is the same?

 

Some More Pieces


We'll examine the profit variable.  And then see that - while the mean is the same - the minimum, maximum, and skewness calculations are different.  The proof:

 

LGroves_0-1770754684442.png

SAS Information Catalog

 

LGroves_1-1770754785763.png

SAS Visual Analytics

 

Unpacking the Differences


My response to the professor - written with the help of my good friend, ChatGPT:


 

Even when two SAS Viya applications point to the same dataset, they may compute descriptive statistics differently because they are designed for different analytical purposes.

 

Same mean ≠ same computation

 

In our example, both SAS Information Catalog and SAS Visual Analytics report the same mean, but different values for minimum, maximum, and skewness. This does not indicate an error.

 

Instead, it reflects how - and on what data - the statistics are computed.


Visual Analytics: full-data analytical computation

 

In SAS Visual Analytics, descriptive statistics are typically computed from the entire dataset (or from a clearly defined filtered query).

 

Conceptually, this is similar to running:

proc means data=MEGACORP2020;
run;

or

proc univariate data=MEGACORP2020;
run;

Key characteristics:

  • Statistics are calculated on all qualifying rows
  • Results reflect the true min, max, and distribution shape
  • Suitable for reporting, modeling, and decision-making

 

Information Catalog: fast profiling using sampling or approximation

 

SAS Information Catalog serves a different role: data discovery and metadata profiling. Its goal is to quickly help users understand large datasets across many columns.

 

To remain performant, especially on large tables, Information Catalog may:

  • Use sampling or approximate profiling for some column statistics
  • Apply different strategies depending on the statistic and column
  • Prioritize responsiveness over exact distributional precision

 

This is why:

  • The mean often matches (it is relatively stable under sampling)
  • Min, max, and skewness may differ (they are highly sensitive to outliers and tail values)

 

In other words, SAS Information Catalog answers:

“What does this column generally look like?”

 

Visual Analytics answers:

“What are the exact statistics for this analysis?”


Why skewness is especially affected

 

Skewness depends on:

  • Higher-order moments
  • Tail behavior
  • Extreme values

 

Even a small sample that misses rare extremes can materially change skewness, while leaving the mean largely unchanged. This makes skewness a strong signal that different row sets were used in the computation.


Key takeaway for students (and analysts)

 

Identical datasets do not guarantee identical statistics unless the computational context is the same.

 

Before trusting or comparing summary statistics, always ask:

  1. Was the full dataset used?
  2. Was sampling applied?
  3. Is the tool optimized for exploration or for analysis?

 

Understanding why numbers differ is often just as important as the numbers themselves.

Comments

Assist from my colleague, Cristina Anton, @antonbcristina, is in!  She shared that sampling is the DEFAULT in SAS Information Catalog - but that defaults can be changed:

 

SAS Help Center: Analysis Options

 

So, wanna run the full data set - then you can update the setting here:

 

SAS Infromation Catalog Setting Adjustment.png

 

Thanks Cristina!

 

 

Contributors
Version history
Last update:
‎02-12-2026 02:42 PM
Updated by:

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags