Now that I’ve gotten you thoroughly interested in what an aggregated data source may be able to do for you - let’s dive into what is a VA 8.3 Aggregated Data Source? Seeing as most people are familiar with the word aggregate, i.e. a whole formed by combining several (typically disparate) elements, how can this be used in a VA report? This new functionality allows a VA report to create a new data source which has, in theory, fewer columns and rows based on the original data source for the report. (I say in theory, because the product does not enforce you to select fewer columns but then, why would you add a duplicate data source to your report?)
For the seasoned report developer, the applications of being able to use an aggregated data source and the original data source together in one report should be planting visions of dashboards in your head. Of course, the next step you would want to take is being able to join the original data source and the aggregated data source together – and you can (squeal now) but that is for my next article, Join Tables in VA Reports. Let’s stick with the aggregated data source and how we can use it to work around the nested aggregated measure limitation and look at ways it can help improve performance when used wisely.
Let’s imagine that the original data source for a report is a wide flat table with lots of rows of data. In this case I will use the Insight Toy data which has 57 columns, 1.4 M rows and about 1.1 GB in size. Now imagine a dashboard such as this:
The underlying data for each visualization is shown below. You can see that two of the visuals use the full data source and the others the aggregated data source. The aggregated data source has 7 columns and only 936 rows and that’s for all 34 years of data!
This new functionality in VA 8.3 provides an out-of-the-box solution to the nested aggregated measure limitation. Previously, to work around this limitation, you would have to load separate data sources to CAS to support the aggregation level you desired for each visualization. Now we can do this within the VA report itself.
If you are just starting off with building reports, you may be wondering, what is a nested aggregated measure? Well, it is an aggregated measure which requires more than one pass through the data. Let’s take a look.
I will be focusing on the List Table object from the report. Here you can see the two columns highlighted that are referred to as nested aggregated measures.
Here’s a high level look at what the columns represent in relation to the time period YearMonth.
To the untrained eye, this looks easy to accomplish – simply take the columns and perform the simple math. Ahhhh... but nothing is ever that easy. Typically, data is not stored aggregated up to the month level in separate columns.
Usually, data is stored at the transaction date level such as this. These are the two columns that I will use to build this summary level table. Remember, this is just a snippet of the data. The Insight Toy data I’m using has 34 years’ worth of daily transactions.
To understand aggregated measures a bit more, let’s look at the expression definitions for each of the columns. Moving from left to right. Let’s assume I am trying to build these expressions from the full data source.
Transaction YearMonth: I duplicated the column Transaction Date and applied the MMMYYYY (MONYY7)
format.
Sales Rep Actual: This column existed in the data. So, I simply left the default aggregation SUM
applied to this column.
Sales Rep Actual (Previous Parallel Period): This is where we define our first aggregated measure. It’s easy to see the _Sum_
aggregation.
Sales Rep Actual (Difference from Previous Parallel Period): This is also an aggregated measure. It will look similar to the earlier example but with a subtraction used in the expression.
Hint: There is an "easy button" for this calculation. Simply right-click on Sales Rep Actual and select New calculation.... Then select the type Difference from previous parallel period and the periodic timeframe Transaction YearMonth.
If you were to right-click on the created aggregated measure, this is what it would look like:
Sales Rep Daily Rate: This is an aggregated measure, but not one that uses a periodic operator. This is simply a ratio of two aggregations.
Sales Rep Daily Rate (Previous Parallel Period): If you remember, the periodic operator contains an aggregation in itself but now we will try to use the Sales Rep Daily Rate expression, which is the ratio of two aggregations, as the operand. This is where we will run into our nested aggregation limitation! The periodic operator requires a non-aggregated measure operand.
Sales Rep Daily Rate (Difference from Previous Parallel Period): This is similar to the previous expression, and we would run into the same nested aggregated expression limitation, but we would have the subtraction of current minus previous periods.
This is where an aggregated data source comes into play. We can use the aggregated data source to do that "inner aggregation" for us. We will aggregate the data to the level we need it so that we can replace the aggregated measure of Sales Rep Daily Rate with a numeric measure expression. Recall how it was broken up into two aggregations.
Select your source data for the report. Then from the Data Pane, use the Data Source Drop-down menu and select New aggregated data source....
Then select the columns you wish to include in your aggregated data source. There are a few key points to remember at this stage:
Here are the seven columns I chose to keep. You can see the lowest level of granularity for my data is at the Year-Month and Facility Continent level. You can also see that the total number of rows returned for my aggregated data source is 936 and is far less than the 1.4 million rows in the original data source.
The aggregation of Sales Rep Actual will satisfy the replacement for aggregation #1 used in the numerator (Sum _ByGroup_ Sales Rep Actual
) of the expression and the Transaction Date (Distinct Count) will satisfy the replacement for aggregation #2 (Distinct _ByGroup_ Transaction Date
).
Click OK. Now I’m able to use this aggregated data source in my VA report. The newly created aggregated data source will be available from the Data Pane.
Now I will use the Aggregated Data Source to rebuild my expressions. Let’s focus on the ones that are the most impacted.
First, Sales Rep Daily Rate. Notice that this expression is now just of type Numeric. This is no longer an aggregated measure since those aggregations were completed with the summarization to create the aggregated data source.
Now let’s look at the Sales Rep Daily Rate (Previous Parallel Period). Since the Sales Rep Daily Rate is now a numeric, we do not get an error for nested aggregations.
And finally, we can create the Sales Rep Daily Rate (Difference from Previous Parallel Period).
You can really appreciate the resulting summary table.
I liked seeing the Sales Rep Actual numbers and how they compared to the previous parallel period as well as the difference between them but I really wanted to see the impact at a daily level. That is what lead me down the path of the nested aggregated measures roadblock. As you can see, the Daily Rate difference can really put into perspective the total variance. If I wanted to take this one step further, I could add in the granularity per sales person since I have the aggregated number of Distinct Count for Sales Reps per Continent per Month, but maybe next time.
Not only does the aggregated data source make this nested aggregated measure possible, but it also means that the all four period calculations are done over 936 rows of data instead of the full data of 1.4 million rows which saves on compute time.
Like any other VA report data source, you can use the aggregated data source to feed multiple objects. I used the aggregated data source to feed both the Time Series Plot and Gauge.
Now you may be wondering, why did I decide to use the full data source for the other objects? For the Geo Map object, I wanted to display the geography hierarchy of Country – State. My aggregated data source is aggregated to the Continent level and I wanted more detail for the Geo Map instead of a single bubble.
The similar is true for the Gauges at the bottom of the report. This object uses the Product Line data item for the Group role. I did not want that level of granularity in summary table.
hello Teri,
very good example and it shows how you should think as a developer of dashboards in VA. I have two questions:
1) how to create column sales rep actual, if you have a column with all the years in one
2) could you add here how you joined the aggregated data and your original table. Your example would be complete then. At this moment I know something but cannot execute to the end.
thank you
Hi Touwen,
1) Can you expand a bit more on your question about how to create the column Sales Rep Actual if the years are all in one. Can you upload a picture of the structure of your data and then a target of what you want?
2) I have an example of the Data Join functionality here: https://communities.sas.com/t5/SAS-Communities-Library/VA-8-3-Join-Tables-in-VA-Reports/ta-p/499974
Thank you,
Teri
hello Teri,
Point 1 is not a problem any more once I added year prompt. Thank you.
2) if you join original source with aggregated table, do you perform usually left join and put all the columns? I have never done it yet. It will be my first aggregation
I have page prompt going from broad category to the smallest category and one aggregated measure number of customers where I need to calculate average or moving average. As of nested aggregation, I am not able to do it. Then when performing aggregation do I need to all of the hierarchies of categories as these will be my page prompts I want to filer on? regards Karolina Touwen
Hi Karolina,
Here is a link to the SAS Documentation which explains the types of joins and their results: Working with Data Source Joins in Reports So it depends on why you want to join your aggregated data source back to the original data source.
From your question on needing to perform a moving average on the number of customers -- this doesn't seem like a number you would want to join back to the original data source but rather just use the aggregated data source to feed the report object. Will you plot this over time or use a crosstab to display your hierarchy?
You may also find this article helpful: SAS Visual Analytics example: moving average
I took a screenshot from the article and added some callouts based on your question.
Thank you,
Teri
hello Terri,
thank you very much. It is amazing that you made this screenshot with callouts. I understand now that mapping is enough. I will try to create it.
hello Teri, the request is to have many filters with categories. If I add the aggregated table, it will calculate the ratio (for example sum net profit margin/ sum net sales)properly for the most detailed category. But this will be untrue for the broader category. How can aggregated table work with more than one prompt for category? regards Karolina T
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.