This question came up in a recent conversation with academic colleagues — but the lesson applies far beyond a single dataset or classroom.
The original question was simple:
Why does the MEGACORP2020 dataset produce different descriptive statistics in SAS Information Catalog and SAS Visual Analytics?
More broadly: why can descriptive statistics differ across SAS Viya applications, even when the underlying data is the same?
We'll examine the profit variable. And then see that - while the mean is the same - the minimum, maximum, and skewness calculations are different. The proof:
SAS Information Catalog
SAS Visual Analytics
My response to the professor - written with the help of my good friend, ChatGPT:
Even when two SAS Viya applications point to the same dataset, they may compute descriptive statistics differently because they are designed for different analytical purposes.
In our example, both SAS Information Catalog and SAS Visual Analytics report the same mean, but different values for minimum, maximum, and skewness. This does not indicate an error.
Instead, it reflects how - and on what data - the statistics are computed.
In SAS Visual Analytics, descriptive statistics are typically computed from the entire dataset (or from a clearly defined filtered query).
Conceptually, this is similar to running:
proc means data=MEGACORP2020;
run;
or
proc univariate data=MEGACORP2020;
run;
Key characteristics:
SAS Information Catalog serves a different role: data discovery and metadata profiling. Its goal is to quickly help users understand large datasets across many columns.
To remain performant, especially on large tables, Information Catalog may:
This is why:
In other words, SAS Information Catalog answers:
“What does this column generally look like?”
Visual Analytics answers:
“What are the exact statistics for this analysis?”
Skewness depends on:
Even a small sample that misses rare extremes can materially change skewness, while leaving the mean largely unchanged. This makes skewness a strong signal that different row sets were used in the computation.
Identical datasets do not guarantee identical statistics unless the computational context is the same.
Before trusting or comparing summary statistics, always ask:
Understanding why numbers differ is often just as important as the numbers themselves.
Assist from my colleague, Cristina Anton, @antonbcristina, is in! She shared that sampling is the DEFAULT in SAS Information Catalog - but that defaults can be changed:
SAS Help Center: Analysis Options
So, wanna run the full data set - then you can update the setting here:
Thanks Cristina!
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.