SAS Visual Analytics Advanced Calculations (part 2 of 4): AggregateTable

6 Likes

This is the second part of a series of articles about advanced aggregations and calculations. In the previous one we covered the AggregateCells operator, and in this one you will explore the AggregateTable operator.

AggregateTable

Have you ever needed to use and aggregated measure inside another aggregated measure? Well, you have probably found out that you were not allowed to do that. The AggregateTable operator is the answer for that.

The AggregateTable operator can be found under the Aggregated (tabular) group, on the left side of the expression window, when creating a new calculated item. These are the parameters it takes, according to the documentation:

01-AggregateTable operator

The aggregation that is applied to the aggregated item (aggregated item is given by the inner Table operator) when it’s used in an object that displays fewer group-by crossings than the inner table in your expression. We will clarify that in a while.
The aggregation that is used in the inner table context.
The type of the group-by crossing (given by the list of categories in #4)
1. Fixed
2. Add
3. Remove
The list of categories that are used to define/alter the data crossing for the aggregation
The measure to be aggregated. You can also add a Table expression here to create nested aggregations.

The Table operator is where the aggregation and its context are defined. The AggregateTable operator requires a Table operator and, at least in the current release, SAS Visual Analytics 8.3, the Table operator can only be used within an AggregateTable operator. The reason why the Table operator exists as a separate operator under the Aggregated (tabular) group, on the left side of the expression window, is because it can be used as the argument #5 to create nested aggregations:

02-Aggregate (tabular) operators

You can think of the output of the Table operator as a table, with its measure (parameter #5) aggregated (according to parameter #2) and grouped by the categories provided or resolved by the context (parameters #3 and #4). That table is then merged with the next outer level, which is either defined by another Table operator (in case they are nested) or the visual object in the report where this calculation is used. In case the set of group-by’s in the outer level does not include all the group-by’s from the inner level, then a reaggregation takes place. If the outer level is another Table operator, the reaggregation is performed according to its parameter #2. If the outer level is the visual object in the report the reaggregation applied is given by parameter #1.

When described like that, this all may seem a bit confusing, but it’s actually very simple. Let’s look at some examples to clarify.

Example 1 – Understanding parameters #2, #3, and #4

In the crosstabs that follow below, you have sales broken by country and product type, as well as different variations to the AggregateTable parameters to see how the calculation is affected. The name of the calculations indicates the aggregation used, the type of the group-by crossing, and the categories (parameters #2, #3, and #4 respectively).

Sum Add [] gives the same results as the default sum (Sales column) because you are not adding any category to the list of categories that you already have in the crosstab (Country and Product type in this example):

03-Example 1: Sum Add[]

04-Example 1: Sum Add[]

The same would have happened if you had added Country, or Product type, or both to the list of categories to create Sum Add [Country], Sum Add [Product type], and Sum Add [Country, Product type], because they do not affect the group-by context already defined by the visualization – the outer group-by context given by the crosstab includes them all. For similar reason, Sum Remove [Region] would not change the results above either.

The documentation states that to perform an aggregation on the entire data source, you must select Fixed as the type of the group-by and do not specify any categories in the group-by. That’s exactly what Sum Fixed [] is doing, and the results are all the same and equal to the grand total:

05-Example 1: Sum Fixed[]

06-Example 1: Sum Fixed[]

Sum Fixed [Country] returns the subtotal for each country, and that’s the value you see independent of the Product Type value. In this specific crosstab visualization, Sum Remove [Product type] would give the same results:

07-Example 1: Sum Fixed[Country]

08-Example 1: Sum Fixed[Country]

Sum Remove [Country] will return the values equivalent of sales by Product type only. So, subtotal for FURNITURE = 97,864 + 101,194 + 91,567 = 290,625, and subtotal for OFFICE = 149,126 + 144,804 + 145,782 = 439,712:

09-Example 1: Sum Remove[Country]

10-Example 1: Sum Remove[Country]

Example 2 – Understanding parameter #1

In this example you are going to understand how the outer most aggregation (parameter #1) works. For that aggregation to be used, the crosstab object must have fewer group-by crossings than what’s determined by the Table operator. That’s exactly what happens when Avg Sum Fixed [Country, Product Type] is added to a crosstab with Country only, or at the country subtotal and grand total rows in the crosstabs below:

11-Example 2: Avg Sum Fixed[Country,Product Type]

12-Example 2: Avg Sum Fixed[Country,Product Type]

13-Example 2: Avg Sum Fixed[Country,Product Type]

Because Product type is not present in the outer level (the visualization) of the first crosstab, neither in the subtotal or grand total rows in the second crosstab, the reaggregation takes place and average is applied, and the input values for the average calculation are those found in the crossings of Country and Product type. This already gives you the “aggregation of an aggregation”, but because it only happens when group-by categories are not present, the expression works in “exception” mode. The more controlled expression to implement aggregation of an aggregation is discussed next.

Example 3 – Aggregation of an aggregation

What if you want the average sales number for the totals sales for each country? This is accomplished by first computing the sum of sales by country, and then calculating the average of those aggregated numbers. A simple aggregated measure that calculates average “for all” would not work as expected, as it would be averaging the individual values, not the aggregated ones. The trick to solve this is to nest Table operators in the AggregateTable, like in the Avg Total Country Sales expression below:

14-Example 3: Aggregation of aggregation

15-Example 3: Aggregation of aggregation

The most inner Table operator with Add [] was explored in the Example 1 and it returns the sum of sales for all the categories in the visualization (country in this case). The outer Table operator takes that result (sum of sales by country) and calculates the average. Again, what Fixed [] does was explained in the Example 1. The value of the most outer aggregation doesn’t really matter in this case because Fixed [] doesn’t have any category listed (it includes the entire input table) and the visualization will never have fewer group-by categories than that (zero), so no reaggregation occurs at that level and any aggregation works the same because this parameter is not used. Average was chosen in this case just to remind us of what this calculation is doing. If you use something like kurtosis it would give the same results, but it could cause confusion - or scare some of us 😊.

Using Add [] in the most inner Table operator has the advantage that this expression is generic enough to work for any category that you use to replace Country, but it might not work as expected in some cases. For example, if you add Product type to the previous crosstab, and turn on totals and subtotals, you will see that the value of $243,445.67 previously calculated for the Avg Total Country Sales only shows up at the country subtotal level. That’s expected because group-by contexts that don’t specify a fixed category will dynamically change according to the crossings in the visualization. If you want a calculation that returns the same average value of $243,445.67 for all rows in the crosstab you would need to slightly modify the expression, as in Avg Total Country Sales (II) below. By specifically fixing Country in the most inner Table operator it returns the sales subtotals by country only, even for other crosstab crossings, like those broken down into individual product types and the grand total:

16-Example 3: Aggregation of aggregation

17-Example 3: Aggregation of aggregation

There is another way you can accomplish “aggregated measure of an aggregated measure”. It uses aggregated data sources, which is discussed in the next article.

Example 4 – Referencing one specific cell

Suppose you have sales numbers by country and you want to know how the other countries are performing when compared against Germany.

The first thing you would need to do is calculate the sales in Germany, which is easily done with a conditional expression:

18-Example 4: Sales in Germany

If you now display that information in a table, you will notice that only the row for Germany has the expected German Sales, while it’s zero for the other countries. Obviously, any attempt to compare other countries sales values against Germany’s will fail:

19-Example 4: Sales in Germany not good for comparison

In the example above, if you rename German Sales to German Sales Aux and use that to recalculate German Sales with the AggregateTable operator, this is what you would get:

20-Example 4: Sales in Germany good for comparison

21-Example 4: Sales in Germany good for comparison

This allows you to refer to sales numbers in Germany independently of the country that you are looking at. Problem solved!

In the next article you will be exploring aggregated data sources and data source joins.

Additional resources on advanced calculations and aggregations:

Documentation: Aggregated tabular operators
Community article: SAS Visual Analytics Advanced Calculations (part 1 of 4): AggregateCells by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 3 of 4): Data Source Operations by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 4 of 4): PMPM Example by SAS' Renato Luppi

acordes · ‎04-24-2020

Dear Renato, that's a nice series about sophisticated VA stuff.

I've applied a lot of your tricks and they're becoming part of my tool kit.

But I'm struggling with combining a aggregated cell with a by-group construct.

The cross table shows correctly what I want to accomplish, the sales figures (freq) as accumulated over month by type of product VN or VO.

But the time series plot does not plot it the way I expected.

The orange line should start at 0.

I've tried every possible combination of aggregated measures (table, aggregated table | cell) but I cannot manage to achieve my goal.

I even created a char representation of the date variable.

Without success. aggregated measure.png

YannRC · ‎03-09-2021

Hello, I just tried to apply but it doesn't work. I'm trying to sum up an aggregate value. SAS tells me that it is not possible to nest aggregate values. Is this really the case? Sorry for my English.

Renato_sas · ‎03-09-2021

@YannRC , do you mind sharing the expression you are trying to use, as well as any other expressions being used by other calculations in your main expression?

Renato_sas · ‎03-09-2021

@acordes , I'm really sorry, but for some reason I've missed your question above and I just noticed it now. It's an old question, but I've tested it with my own data and got a different behavior, still not what I was expecting, though, but different than yours. I've opened a Tech Support track and will keep you posted. I've used the AggregateCell operator because I thought it would be a more straight forward expression. If you used something different, please let me know.

touwen_k · ‎09-19-2021

Thank you for this article. I think that you have just solved two of my three problems. I put sales hard coded for 2020 and then applied this formula. I think it works! Many thanks!

The third problem is: if you want to count a unique number of customers based on customer number and then calculate average number of customers for different categories. I do not know how to do that, could you advise?

Renato_sas · ‎09-20-2021

@touwen_k, I guess the trick is to calculate the distinct values in an aggregated data source first, as explained in part 3, then use that table in your crosstab to calculate the average.

For example, if you want the distinct number of customers per State and then the average of that per Country, you would:

1. Derive a calculated item (e.g. Customer Distinct Count) for distinct number of customers (based on their ID)

2. Create an aggregated data source containing Country, State, and Customer Distinct Count

3. In the resulting aggregated data source, rename Customer Distinct Count to Customer Distinct Count per State (because this is what it really is now)

4. Change the default sum aggregation of Customer Distinct Count per State to be average

5. Create a crosstab with Country, and Customer Distinct Count per State out of the aggregated data source

You could use the AggregateTable operator for step #4, but it's not necessary.

touwen_k · ‎09-22-2021

thank you for your answer, it will work for one category.

But the request I have received are for almost 6 categories in hierarchy ( they are used to filter the page). When I created aggregated source, then the distinct count will be correct for my smallest category, but not for the all the 5 broader categories. I do not know at this moment how to calculate average or moving average for that.

Renato_sas · ‎09-22-2021

When you say 6 categories in the hierarchy, is it a true hierarchy data item, or are you just displaying 6 categories side-by-side in the crosstab? If it's a true hierarchy, do you have totals and subtotals turned on? This solution will not work for totals and subtotals because the distinct count is performed at the most granular level that you define, before the crosstab query is processed. If you don't have totals and subtotals and you want to filter based on those 6 categories, you can redefine your aggregated data source and add advanced filters based on parameters. Something like this:

category1 = parameter1 and category2 = parameter2 and ... and category6 = parameter6

Whether or not categories 1 to 6 should be in your aggregated data source definition will depend on if you want to have them in the crosstab or not.

Renato_sas · ‎09-22-2021

One more thing, sorry...

I can think of a way to solve this calculation, but I'd need to know all the requirements. Also, if we are talking about average of distinct count, we must define a level where the distinct count is performed, so it can be averaged in the upper levels. In my example above, Country+State was that level where distinct count was being performed. Any other levels/crossings would be calculated by averaging those base-level values.

touwen_k · ‎09-23-2021

I will come back to this topic and prepare detailed description, but would it be possible to ask somthing else, as I need to solve it right now? I am beginner in VA. I need to calculate sales for last week, last 52 wks and last 12 wks. Last 12 wks is was thinking to use cumulative period and offset per quarter. It will not be the the rolling 12 wks, when the quarter is running, but this is the standard calcualtion available. Could you support me with advice on the other measures how to calculate? It cost me quite some time until I will arrive at it myself.

acordes · ‎09-23-2021

@touwen_k that's the way learning happens 🙂

It's hard but it's rewarding.

I've replied to the question you have raised in the post you had opened.

Renato_sas · ‎09-23-2021

Hi @touwen_k,

I'll do my best to help, but I would need additional information:

What's the VA version you are using?
What do the source table columns look like? I mean, do you have information about week, or it's a date column?
Does your week start on Sunday, Monday, etc., or it doesn't matter?
How do you want to visualize sales information for last N numbers of week? Is it one KPI number using today's date as the baseline, the last date available in your source table, or some other reference date? Or you want to see it in a table where each row has a reference date or week?
When you say "last week", accumulating sales for past 7 running days works for you?
An example of how you would like to visualize the information and how you would like it to be calculated would be helpful.

Renato_sas · ‎09-23-2021

I understand you might be under time pressure, so I've tried something for you.

I'm using VA 8.5 on Viya 3.5
My source table has a date column that I used to calculate Week of
My week starts on Sunday
I've used a crosstab object to visualize the Sales figures. The date reference for the calculation is Week of
Last week sales accumulates the sales on the previous week, Last 12 weeks sales accumulates the previous 12 (it does not include the current week, but the formula allows for adjustments), and so on (see picture)

Calculations:

Week of

Last week sales

Last 12 weeks sales

Last 52 weeks sales

The AggregateCells operator is explained here.

touwen_k · ‎09-24-2021

thank you very much for your help. It is amazing what you did, or awesome as American say! I had no idea that aggregate cells is the answer to this problem. I was trying to use cumulative operator.

I am using Viya 4. At this moment it is not important if the week starts on Sunday or Monday.

I have tried out the aggregate cells formula. It works beautifully with crosstab or list table if you have all the weeks, starting from 01Jan2019 to current week in 2021. I believe you cannot retrieve it as a single value to put into a crosstab or text, can you?

However the request is to have visualisation in a crosstab as one single sum, pls see below and preferably with a drop down to choose last week or last 12 wks. As this is not a standard function, I will try to make cumulative calculations per quarter to today and then build a filter to limit to current month, so that you can compare the same periods if the quarter is not finished. I hope it will work. For last wk I am struggeling with parallel period as I would like to have only one last wk in my table, but it needs at least two wks to show the difference. How can you have only one single result for parallel period for last wk?