SAS Visual Analytics: Understanding the AggregateTable Function for Beginners

2 Likes

AggregateTable is a powerful function in SAS Visual Analytics. In this post, I will explain what it is and give two examples on how it can be used. For a more in-depth view of AggregateTable see Renato Luppi's post here. For an example with bee metaphors and Key Value objects, I highly recommend Nicole Ball's post here.

When do I use AggregateTable?

You can use the AggregateTable function whenever you want to:

Compare measures grouped at two different levels.
Take the aggregation of an aggregation.

Take the Employees table below as an example.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

In this table, we have one observation per employee. We have information on their Job Title, Country and Annual Salary. Here, you might want to compare the salary of an individual employee with the salary of all employees with the same Job Title. You can use the AggregateTable function to display these two measures side by side in the same visualization. See the section Scenario 1: modifying the aggregation context for this.

Alternatively, you might want to calculate the average Annual Salary by Job Title, but only display the maximum of that by Department. See the section Scenario 2: Aggregation of an aggregation for details on this.

In either case, it is necessary to first understand what the aggregation context is.

What is the aggregation context?

When you create a visualization in SAS Visual Analytics, measures are automatically grouped by the categories in that object. These categories are also called the aggregation context. Using the Employees table (shown above) as a data source, I create a list table with Job Title and Annual Salary as columns. Here, Job Title is a category and Annual Salary is a measure. I have changed the aggregation of Salary from the default sum to average. I obtain the following list table:

Here the aggregation context is Job Title, because that is the only category in this object. This list table allows us to see what the average Annual Salary for each Job Title is. On the other hand, when I create a list table with Employee Name and Annual Salary, the aggregation context is Employee Name:

Here, I can see each employee’s individual Annual Salary. (In this case the data is actually unaggregated as the input table already has one line per employee.) To summarize: Visual Analytics always groups the measures by the displayed categories in a visualization. These categories are known as the aggregation context.

Scenario 1: Modify the aggregation context

I might want to display each individual employee’s Annual Salary next to the average for their Job Title. Therefore, one might create a list table with Employee Name, Job Title and Salary. However, the aggregation context for both Annual Salary columns would be Employee Name and Job Title, resulting in the following list table:

By using the default aggregation behavior of the list table object, I am not able to get two different levels of aggregation. Notice that both salary measures are grouped by the combination of Employee Name and Job Title. I want a single visualization to return the Annual Salary grouped by Employee Name and then the average Annual Salary grouped by Job Title. We can solve this by creating a new data item using the AggregateTable function. There are two different ways of achieving this.

Option 1: Use a fixed aggregation context

I will specify that Annual Salary should be grouped by Job Title only. That is, I shall create a new measure called Salary (Job Title) where I will fix Job Title as the aggregation context for this data item. I specify the following:

This results in the following formula:
AggregateTable(_Avg_, Table(_Avg_, Fixed('Job Title'n), 'Annual Salary'n))

I will explain in scenario 2 why I specify _Avg_ (average) as the aggregation twice (short answer: the second and final aggregation is not applied in this scenario).

This new data item will always display salary grouped by Job Title:

Note that, for example, Annual Salary (Job Title) has the same value for rows 1 and 2, as both employees have the same Job Title.

Option 2: Remove a category from the aggregation context

Recall that the problem earlier was that Annual Salary was grouped by the combination of Employee Name and Job Title. I can remove Employee Name from the aggregation context so only Job Title remains:

The formula looks as follows:
AggregateTable(_Avg_, Table(_Avg_, Remove('Employee Name'n), 'Annual Salary'n))

It delivers the same result:

There is an advantage to using this approach: I can reuse this calculated item in another list table where I compare each employee’s salary to the average in their department instead of Job Title (Columns: Employee Name, Department, Annual Salary). This works as Visual Analytics always removes Employee Name from the aggregation context. This leaves the remaining categories: Employee Name, Department, Annual Salary in this example. (The only thing one would have to adapt is the name of the calculated data item.)

Scenario 2: Aggregation of an aggregation

Remember in scenario 1, I had to specify the aggregation _Avg_ twice:
AggregateTable(_Avg_, Table(_Avg_, …

In scenario 1, it didn't make a difference what I selected for the final aggregation. The final aggregation only comes into play when you want to squeeze multiple numbers into one cell in the visualization. I will talk about such a scenario in this section.

Let me first explain the scenario and what is happening behind the scenes. I first want to calculate the average Annual Salary by Job Title and Department, and then display the highest average Annual Salary by Job Title for each Department. This is summarized in the image below:

09_NR_42818_1_Aggregation-of-an-aggregation-overview.png

However, there is a problem: there are two Job Titles in the Sales department. However, in the final list table, I have only one cell for each Department. How does Visual Analytics squeeze two numbers into one cell in the final list table?

10_NR_42818_1_Aggregation-of-an-aggregation-squeeze.png

This is where the second and final aggregation comes into play. By specifying _Max_ as the final aggregation, I take the maximum of the average annual salaries of job titles within each department. This returns us one number per department, which fits neatly into the space I have in the final list table. Now that the theory is clear, I will go ahead and do this in Visual Analytics.

I start with a list table of Department and Annual Salary. This is the visualization to which I want to add the calculated data item:

As you can see, the aggregation context in this list table is Department. I shall now add Job Title to the aggregation context. Therefore, Visual Analytics calculates the average Annual Salary grouped by the existing aggregation context (Department, which is already in the visualization) plus Job Title (which I added in the AggregateTable function). I then specify _Max_ as the final aggregation:

This final aggregation is only relevant when you want to squeeze in multiple values into not enough space in a visualization. More formally, the visualization had fewer group-by crossings than what was specified in the AggregateTable function. I get the final list table:

Conclusion

The AggregateTable function is a powerful tool that comes in handy whenever you need to change the aggregation context in a visualization. It allows us to choose a fixed aggregation context, and remove or add variables from the aggregation context. You can choose a final aggregation when there are too few group-by crossings in the visualization. In scenario 2, we took the average by Job Title and Department, and then the maximum by Department - an aggregation of an aggregation. You can do even more than two aggregations within the same AggregateTable expression, by nesting multiple Table functions, making it extremely versatile.