BookmarkSubscribeRSS Feed

VA 8.3 Aggregated Data Source, an answer to nested aggregated measures and faster reports!

Started ‎09-04-2018 by
Modified ‎09-28-2018 by
Views 7,805

Now that I’ve gotten you thoroughly interested in what an aggregated data source may be able to do for you - let’s dive into what is a VA 8.3 Aggregated Data Source? Seeing as most people are familiar with the word aggregate, i.e. a whole formed by combining several (typically disparate) elements, how can this be used in a VA report? This new functionality allows a VA report to create a new data source which has, in theory, fewer columns and rows based on the original data source for the report. (I say in theory, because the product does not enforce you to select fewer columns but then, why would you add a duplicate data source to your report?)

 

For the seasoned report developer, the applications of being able to use an aggregated data source and the original data source together in one report should be planting visions of dashboards in your head. Of course, the next step you would want to take is being able to join the original data source and the aggregated data source together – and you can (squeal now) but that is for my next article, Join Tables in VA Reports. Let’s stick with the aggregated data source and how we can use it to work around the nested aggregated measure limitation and look at ways it can help improve performance when used wisely.

 

Let’s imagine that the original data source for a report is a wide flat table with lots of rows of data. In this case I will use the Insight Toy data which has 57 columns, 1.4 M rows and about 1.1 GB in size. Now imagine a dashboard such as this:

 

01_ExampleDashboard.png

 

The underlying data for each visualization is shown below. You can see that two of the visuals use the full data source and the others the aggregated data source. The aggregated data source has 7 columns and only 936 rows and that’s for all 34 years of data!

 

02_ExampleDashboard-DataSourcesv4.png

 

 

Why an aggregated data source?

This new functionality in VA 8.3 provides an out-of-the-box solution to the nested aggregated measure limitation. Previously, to work around this limitation, you would have to load separate data sources to CAS to support the aggregation level you desired for each visualization. Now we can do this within the VA report itself.

 

If you are just starting off with building reports, you may be wondering, what is a nested aggregated measure? Well, it is an aggregated measure which requires more than one pass through the data. Let’s take a look.

 

I will be focusing on the List Table object from the report. Here you can see the two columns highlighted that are referred to as nested aggregated measures.

 

03_NestedAggMeasureInListTable.png

 

Here’s a high level look at what the columns represent in relation to the time period YearMonth.

 

04_ListTableDescription.png

 

To the untrained eye, this looks easy to accomplish – simply take the columns and perform the simple math. Ahhhh... but nothing is ever that easy. Typically, data is not stored aggregated up to the month level in separate columns.

 

Usually, data is stored at the transaction date level such as this. These are the two columns that I will use to build this summary level table. Remember, this is just a snippet of the data. The Insight Toy data I’m using has 34 years’ worth of daily transactions.

 

05_FullDataSource.png

 

To understand aggregated measures a bit more, let’s look at the expression definitions for each of the columns. Moving from left to right. Let’s assume I am trying to build these expressions from the full data source.

 

06_LeftToRight.png

 

 

Transaction YearMonth: I duplicated the column Transaction Date and applied the MMMYYYY (MONYY7) format.

 

Sales Rep Actual: This column existed in the data. So, I simply left the default aggregation SUM applied to this column.

 

Sales Rep Actual (Previous Parallel Period): This is where we define our first aggregated measure. It’s easy to see the _Sum_ aggregation.

 

07_SalesRepActualPPP.png

 

Sales Rep Actual (Difference from Previous Parallel Period): This is also an aggregated measure. It will look similar to the earlier example but with a subtraction used in the expression.

 

Hint: There is an "easy button" for this calculation. Simply right-click on Sales Rep Actual and select New calculation.... Then select the type Difference from previous parallel period and the periodic timeframe Transaction YearMonth.

 

08_EasyButton.png

 

If you were to right-click on the created aggregated measure, this is what it would look like:

 

09_SalesRepActualDiffPPP.png

 

Sales Rep Daily Rate: This is an aggregated measure, but not one that uses a periodic operator. This is simply a ratio of two aggregations.

 

10_SalesRepDailyRate.png

 

Sales Rep Daily Rate (Previous Parallel Period): If you remember, the periodic operator contains an aggregation in itself but now we will try to use the Sales Rep Daily Rate expression, which is the ratio of two aggregations, as the operand. This is where we will run into our nested aggregation limitation! The periodic operator requires a non-aggregated measure operand.

 

11_SalesRepDailyRatePPP.png

 

Sales Rep Daily Rate (Difference from Previous Parallel Period): This is similar to the previous expression, and we would run into the same nested aggregated expression limitation, but we would have the subtraction of current minus previous periods.

 

 

How do we get around this limitation? How does the aggregated data source help?

This is where an aggregated data source comes into play. We can use the aggregated data source to do that "inner aggregation" for us. We will aggregate the data to the level we need it so that we can replace the aggregated measure of Sales Rep Daily Rate with a numeric measure expression. Recall how it was broken up into two aggregations.

 

12_Technique.png

 

 

How to create the aggregated data source

Select your source data for the report. Then from the Data Pane, use the Data Source Drop-down menu and select New aggregated data source....

 

13_NewAggregatedDataSource.png

 

Then select the columns you wish to include in your aggregated data source. There are a few key points to remember at this stage:

  • Make sure the type of aggregation you want in your aggregated data source is set in the source data. Once you bring over the data item, without the detail data, you may not be able to aggregate the data differently.
  • If you bring over non-additive data items, such as distinct count, be sure you have selected the correct category columns you wish to use in your visuals and calculations, otherwise the aggregation of these values will be incorrect.
  • Try to select only the data items you need and avoid the "bring everything" mentality. The "bring everything" mentality is for your full data source and this is a subset of that data and is to aid in nested aggregated measure limitation or improve report performance by sourcing multiple visuals to reduce redundant aggregation overhead.

 

Here are the seven columns I chose to keep. You can see the lowest level of granularity for my data is at the Year-Month and Facility Continent level. You can also see that the total number of rows returned for my aggregated data source is 936 and is far less than the 1.4 million rows in the original data source.

 

14_AggDataSourceColumns.png

 

The aggregation of Sales Rep Actual will satisfy the replacement for aggregation #1 used in the numerator (Sum _ByGroup_ Sales Rep Actual) of the expression and the Transaction Date (Distinct Count) will satisfy the replacement for aggregation #2 (Distinct _ByGroup_ Transaction Date).

 

Click OK. Now I’m able to use this aggregated data source in my VA report. The newly created aggregated data source will be available from the Data Pane.

 

15_DataSourcePane.png

 

 

Now I will use the Aggregated Data Source to rebuild my expressions. Let’s focus on the ones that are the most impacted.

 

First, Sales Rep Daily Rate. Notice that this expression is now just of type Numeric. This is no longer an aggregated measure since those aggregations were completed with the summarization to create the aggregated data source.

 

16_NewSalesRepDailyRate.png

 

Now let’s look at the Sales Rep Daily Rate (Previous Parallel Period). Since the Sales Rep Daily Rate is now a numeric, we do not get an error for nested aggregations.

 

17_NewSalesRepDailyPPP.png

 

And finally, we can create the Sales Rep Daily Rate (Difference from Previous Parallel Period).

 

18_NewSalesRepDailyDiffPPP.png

 

You can really appreciate the resulting summary table.

 

19_ListTable.png

 

I liked seeing the Sales Rep Actual numbers and how they compared to the previous parallel period as well as the difference between them but I really wanted to see the impact at a daily level. That is what lead me down the path of the nested aggregated measures roadblock. As you can see, the Daily Rate difference can really put into perspective the total variance. If I wanted to take this one step further, I could add in the granularity per sales person since I have the aggregated number of Distinct Count for Sales Reps per Continent per Month, but maybe next time.

 

Not only does the aggregated data source make this nested aggregated measure possible, but it also means that the all four period calculations are done over 936 rows of data instead of the full data of 1.4 million rows which saves on compute time.

 

Like any other VA report data source, you can use the aggregated data source to feed multiple objects. I used the aggregated data source to feed both the Time Series Plot and Gauge.

 

20_TimeSeriesAndGauge.png

 

 

Now you may be wondering, why did I decide to use the full data source for the other objects? For the Geo Map object, I wanted to display the geography hierarchy of Country – State. My aggregated data source is aggregated to the Continent level and I wanted more detail for the Geo Map instead of a single bubble.

 

21_GeoMap.png

 

 

The similar is true for the Gauges at the bottom of the report. This object uses the Product Line data item for the Group role. I did not want that level of granularity in summary table.

 

22_GroupedGauges.png

 

 

Key Points about using an Aggregated Data Source

  • The aggregated data source is a temporary table created when the report is opened. This temporary table is using the individual’s CASUSER caslib location.
    • If the aggregated data source is “small” and used to feed multiple report objects, then you could potentially improve report performance.
    • If you create many aggregated data sources, one per report object, then you could potentially degrade report performance.
    • If the aggregated data source is essentially the same size as the original data source, then you could potentially degrade report performance.
  • Able to add a filter to subset the aggregated data source.
  • Measure aggregations are based on the original data source so make sure your aggregation type is set before you create your aggregated data source.
  • Not able to include aggregated measures from the original source data.

 

 

Comments

hello Teri,

very good example and it shows how you should think as a developer of dashboards in VA. I have two questions:

1) how to create column sales rep actual, if you have a column with all the years in one

2) could you add here how you joined the aggregated data and your original table. Your example would be complete then. At this moment I know something but cannot execute to the end. 

thank you

Hi Touwen,

1) Can you expand a bit more on your question about how to create the column Sales Rep Actual if the years are all in one.  Can you upload a picture of the structure of your data and then a target of what you want?

2) I have an example of the Data Join functionality here:  https://communities.sas.com/t5/SAS-Communities-Library/VA-8-3-Join-Tables-in-VA-Reports/ta-p/499974 

 

Thank you,

Teri

hello Teri,

Point 1 is not a problem any more once I added year prompt. Thank you.

2) if you join original source with aggregated table, do you perform usually left join and put all the columns? I have never done it yet. It will be my first aggregation

 

I have page prompt going from broad category to the smallest category and one aggregated measure number of customers where I need to calculate average or moving average. As of nested aggregation, I am not able to do it. Then when performing aggregation do I need to all of the hierarchies of categories as these will be my page prompts I want to filer on? regards Karolina Touwen

Hi Karolina,

 

Here is a link to the SAS Documentation which explains the types of joins and their results:  Working with Data Source Joins in Reports   So it depends on why you want to join your aggregated data source back to the original data source.

 

From your question on needing to perform a moving average on the number of customers -- this doesn't seem like a number you would want to join back to the original data source but rather just use the aggregated data source to feed the report object.  Will you plot this over time or use a crosstab to display your hierarchy?

 

You may also find this article helpful:  SAS Visual Analytics example: moving average 

 

I took a screenshot from the article and added some callouts based on your question.

 

AggDataSource-MovingAverage.png

 

Thank you,

Teri

hello Terri,

 

thank you very much. It is amazing that you made this screenshot with callouts. I understand now that mapping is enough. I will try to create it.

hello Teri, the request is to have many filters with categories. If I add the aggregated table, it will calculate the ratio (for example sum net profit margin/ sum net sales)properly for the most detailed category. But this will be untrue for the broader category. How can aggregated table work with more than one prompt for category? regards Karolina T

Version history
Last update:
‎09-28-2018 01:12 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags