This is the second part of a series of articles about advanced aggregations and calculations. In the previous one we covered the AggregateCells operator, and in this one you will explore the AggregateTable operator.
Have you ever needed to use and aggregated measure inside another aggregated measure? Well, you have probably found out that you were not allowed to do that. The AggregateTable operator is the answer for that.
The AggregateTable operator can be found under the Aggregated (tabular) group, on the left side of the expression window, when creating a new calculated item. These are the parameters it takes, according to the documentation:
The Table operator is where the aggregation and its context are defined. The AggregateTable operator requires a Table operator and, at least in the current release, SAS Visual Analytics 8.3, the Table operator can only be used within an AggregateTable operator. The reason why the Table operator exists as a separate operator under the Aggregated (tabular) group, on the left side of the expression window, is because it can be used as the argument #5 to create nested aggregations:
You can think of the output of the Table operator as a table, with its measure (parameter #5) aggregated (according to parameter #2) and grouped by the categories provided or resolved by the context (parameters #3 and #4). That table is then merged with the next outer level, which is either defined by another Table operator (in case they are nested) or the visual object in the report where this calculation is used. In case the set of group-by’s in the outer level does not include all the group-by’s from the inner level, then a reaggregation takes place. If the outer level is another Table operator, the reaggregation is performed according to its parameter #2. If the outer level is the visual object in the report the reaggregation applied is given by parameter #1.
When described like that, this all may seem a bit confusing, but it’s actually very simple. Let’s look at some examples to clarify.
In the crosstabs that follow below, you have sales broken by country and product type, as well as different variations to the AggregateTable parameters to see how the calculation is affected. The name of the calculations indicates the aggregation used, the type of the group-by crossing, and the categories (parameters #2, #3, and #4 respectively).
Sum Add  gives the same results as the default sum (Sales column) because you are not adding any category to the list of categories that you already have in the crosstab (Country and Product type in this example):
The same would have happened if you had added Country, or Product type, or both to the list of categories to create Sum Add [Country], Sum Add [Product type], and Sum Add [Country, Product type], because they do not affect the group-by context already defined by the visualization – the outer group-by context given by the crosstab includes them all. For similar reason, Sum Remove [Region] would not change the results above either.
The documentation states that to perform an aggregation on the entire data source, you must select Fixed as the type of the group-by and do not specify any categories in the group-by. That’s exactly what Sum Fixed  is doing, and the results are all the same and equal to the grand total:
Sum Fixed [Country] returns the subtotal for each country, and that’s the value you see independent of the Product Type value. In this specific crosstab visualization, Sum Remove [Product type] would give the same results:
Sum Remove [Country] will return the values equivalent of sales by Product type only. So, subtotal for FURNITURE = 97,864 + 101,194 + 91,567 = 290,625, and subtotal for OFFICE = 149,126 + 144,804 + 145,782 = 439,712:
In this example you are going to understand how the outer most aggregation (parameter #1) works. For that aggregation to be used, the crosstab object must have fewer group-by crossings than what’s determined by the Table operator. That’s exactly what happens when Avg Sum Fixed [Country, Product Type] is added to a crosstab with Country only, or at the country subtotal and grand total rows in the crosstabs below:
Because Product type is not present in the outer level (the visualization) of the first crosstab, neither in the subtotal or grand total rows in the second crosstab, the reaggregation takes place and average is applied, and the input values for the average calculation are those found in the crossings of Country and Product type. This already gives you the “aggregation of an aggregation”, but because it only happens when group-by categories are not present, the expression works in “exception” mode. The more controlled expression to implement aggregation of an aggregation is discussed next.
What if you want the average sales number for the totals sales for each country? This is accomplished by first computing the sum of sales by country, and then calculating the average of those aggregated numbers. A simple aggregated measure that calculates average “for all” would not work as expected, as it would be averaging the individual values, not the aggregated ones. The trick to solve this is to nest Table operators in the AggregateTable, like in the Avg Total Country Sales expression below:
The most inner Table operator with Add  was explored in the Example 1 and it returns the sum of sales for all the categories in the visualization (country in this case). The outer Table operator takes that result (sum of sales by country) and calculates the average. Again, what Fixed  does was explained in the Example 1. The value of the most outer aggregation doesn’t really matter in this case because Fixed  doesn’t have any category listed (it includes the entire input table) and the visualization will never have fewer group-by categories than that (zero), so no reaggregation occurs at that level and any aggregation works the same because this parameter is not used. Average was chosen in this case just to remind us of what this calculation is doing. If you use something like kurtosis it would give the same results, but it could cause confusion - or scare some of us 😊.
Using Add  in the most inner Table operator has the advantage that this expression is generic enough to work for any category that you use to replace Country, but it might not work as expected in some cases. For example, if you add Product type to the previous crosstab, and turn on totals and subtotals, you will see that the value of $243,445.67 previously calculated for the Avg Total Country Sales only shows up at the country subtotal level. That’s expected because group-by contexts that don’t specify a fixed category will dynamically change according to the crossings in the visualization. If you want a calculation that returns the same average value of $243,445.67 for all rows in the crosstab you would need to slightly modify the expression, as in Avg Total Country Sales (II) below. By specifically fixing Country in the most inner Table operator it returns the sales subtotals by country only, even for other crosstab crossings, like those broken down into individual product types and the grand total:
There is another way you can accomplish “aggregated measure of an aggregated measure”. It uses aggregated data sources, which is discussed in the next article.
Suppose you have sales numbers by country and you want to know how the other countries are performing when compared against Germany.
The first thing you would need to do is calculate the sales in Germany, which is easily done with a conditional expression:
If you now display that information in a table, you will notice that only the row for Germany has the expected German Sales, while it’s zero for the other countries. Obviously, any attempt to compare other countries sales values against Germany’s will fail:
In the example above, if you rename German Sales to German Sales Aux and use that to recalculate German Sales with the AggregateTable operator, this is what you would get:
This allows you to refer to sales numbers in Germany independently of the country that you are looking at. Problem solved!
In the next article you will be exploring aggregated data sources and data source joins.
Additional resources on advanced calculations and aggregations:
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.