BookmarkSubscribeRSS Feed

The Visual Analytics Alphabet Series – A is for Aggregated Data!

Started Thursday by
Modified Thursday by
Views 131

ABC Blocks.pngThis is the first part of a multi-part series that discusses some of the distinctive features of SAS Visual Analytics.

  • A is for Aggregated Data (this article!)

 

 

 

 

A few weeks ago, my husband and I went to see Scream 7 in the theaters (we are both HUGE horror fans). During the previews, they did a recap of everything you need to know about the Scream series, from A to Z. Inspiration struck! This is the perfect format for showcasing all of the great known (and sometimes little known) features of SAS Visual Analytics… thus the Visual Analytics Alphabet Series was born.


I immediately got to work and discovered that not only could I find features for every letter of the alphabet (some may be a bit of a stretch… I’m looking at you X and Z), but some letters needed to be used more than once.

 

Today, is brought to you by the letter A… which is for Aggregated Data.

 

Aggregated data sources were introduced in SAS Visual Analytics 8.3 as a way for report viewers to increase the efficiency of their reports. The goal was to create a smaller table with totals (or aggregations) for a subset of columns and apply this data source to multiple objects. To maximize efficiency gains, it was suggested that report viewers select category columns with few distinct counts. Any measure data items that are included in the aggregated data source are aggregated (depending on the aggregation specified for the measure) for each subset of the distinct crossing of the categories selected. Benchmarking can then be performed to ensure that report performance has improved, which is not guaranteed.

 

Pro Tip! You can access performance statistics for your reports by clicking Ctrl+Alt+P and selecting Data in the drop-down list.

 


Very quickly, however, it was discovered that aggregated data sources have an even more desirable use; they can be used to group aggregated measures.

 

Let’s consider a scenario. We want to analyze data from various horror movies (as a shout out to the inspiration of this series). We have two data sources that contain the information we need: HORROR_MOVIES contains a description of each movie, the tagline, release date, original language, genre, the collection, a popularity score, budget, revenue, and runtime. KILLCOUNTS has the release year and the total confirmed kills for the movie. For more details about the data and the data preparation steps undertaken, please see The Visual Analytics Alphabet Series – The Data.

 

Specifically, we would like to determine how many movie collections (or movie franchises, like Halloween or Scream) contain a low average kill count (less than 50), a medium average kill count (between 50 and 100), or a high average kill count (more than 100).


We have created an aggregated measure (Average Kill Count) that counts the total kills for all movies in a collection and divides by the number of distinct movies in the collection.

A_AggregatedMeasure.png

 

Pro Tip! Aggregated measures are calculated on the context specified. ByGroup indicates that the calculation should be performed for each group (or each distinct crossing of category data items in the aggregated data source. ForAll indicates that the calculation should be performed on all the data.

 

To classify each collection as Low, Medium, or High, we want to create a custom category or calculated item. Average Kill Count, however, is an aggregated measure (not a regular measure), so we cannot do the grouping on the original table. Only categories and measures can be used to create a custom category or calculated item.


To remedy this issue, we can create a small, aggregated table of Collection ID, Collection Name, and Average Kill count, which has 106 rows (the number of collections) and 3 columns.

 

Note: In this instance Average Kill Count will be calculated for each movie collection by taking the total kill count for that collection and dividing by the number of movies in that collection.

 

To create an aggregated data source, select Actions > New data from aggregation of <data-source> on the Data pane. This opens the New Aggregated Data window where you can specify the name of the aggregated data source (Movie Details Aggregated) and the data items to use to build the aggregated table (Collection ID, Collection Name, and Average Kill Count). A preview of the new aggregated table appears at the bottom of the window.

A_NewAggregatedData.png

 

Because not all horror movies are part of a collection, we will add a data source filter to remove data where Collection ID = ‘NA’. To do this, click Filters > New Filter in the right pane of the New Aggregated Data window and select Collection ID.


Then, click the More button next to Collection ID and select Advanced edit. In the Expression editor, add the following expression:

A_Filter.png

 

A new temporary table is created in your personal CASUSER caslib, added as a data source to the Data pane, and can be used in your report. This temporary table is available for the current report only and is cleaned up when no longer needed (that is, when the user closes the report).

 

Notice, for the new table Average Kill Count appears as a regular measure, so we can use it to create a custom category or a calculated item.

A_DataPane.png

 


For this scenario, we can create a custom category (Carnage Level) that categorizes the average kill count for each collection.

A_CustomCategory.png

As an alternative, you can create a calculated item.

A_CalculatedItem.png

The new data item can be used in any report object to show the number of collections that have a low, medium, or high average kill count.

A_ListTable.png

 


Be aware that a temporary table is created for each user that accesses the report. This could create performance issues if many users access the report concurrently and the temporary tables are large. In this case, it may be better to create the aggregated data outside of Visual Analytics (SAS Studio, I’m looking at you!) instead. Because of this, aggregated data is not the best option for high-traffic reports and should not replace your data preparation tool.


Aggregated data sources, however, are great for exploration, prototyping, and ad-hoc analysis and can be a great way to get results faster, experiment with the level of detail, and validate the aggregation before putting it in production.


For more information about aggregated data, check out these resources:
Self-Service Data Preparation in SAS Visual Analytics: A Report Designer’s Delight 
SAS Visual Analytics Advanced Calculations: Data Source Operations 

 

Contributors
Version history
Last update:
Thursday
Updated by:

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags