SAS Visual Analytics Advanced Calculations (part 3 of 4): Data Source Operations

4 Likes

This is the third part of a series of articles about advanced aggregations and calculations. In the previous one we covered the AggregateTable operator, and in this one you will explore aggregated data sources and data source joins.

Both aggregated data sources and data source joins take tables as input and generate temporary tables as output. Those temporary output tables can be used as source tables for visualizations and/or input for subsequent aggregated data sources and data source joins.

Aggregated data source

Aggregated data source was introduced in SAS Visual Analytics 8.3 and the concept behind it is simple: dynamically create an aggregated table that can be used as a data source in the current report. Because the aggregated data source is created on the fly and it is visible only by the user that is consuming the report, it works as a calculation.

Creating an aggregated data source

To create an aggregated data source, you select the input source table from the Data pane on the left, click on the Actions icon, and select New aggregated data source:

01-Aggregated data source menu

The New Aggregated Data Source window that pops up is straight forward. Just select the columns (categories and measures) that you want in the resulting table by moving them from the Available items on the left to the Selected items on the right and hit OK:

02-Aggregated data source window

The output table produced is equivalent to what the following SQL statement would produce (this is just for reference – there is no SQL query executed):

select <categories and aggreg(measures)> from <input table> where <filter> group by <categories>

The aggregations used with the selected measures are the aggregations assigned to the measures on the Data pane. Calculated items and aggregated measures can also be selected and if they use parameters, the expressions are evaluated on the fly. Calculated items and aggregated measures become regular measures in the output table and can be used in new aggregated items and aggregated measures expressions. This is another way to allow “aggregated measure of an aggregated measure”, discussed in the previous article. Custom categories are also accepted.

Observe that on the top right corner of the New Aggregated Data Source window there is the + New filter option that allows you to add data source filters. If you choose to add an Advanced filter, you can use complex filter expressions that includes parameters.

Using parameters in calculated items, aggregated measures, and data source filters is what makes the aggregated data source dynamic, flexible, interesting, and powerful, differentiating itself from a table that is aggregated upfront, as a data prep process.

For more information on aggregated data sources, please refer to the documentation.

Data source join

Data source join was also introduced in SAS Visual Analytics 8.3 and as you may be thinking, it allows for source tables to be joined, creating an output table that can be used in the report. Like the aggregated data source, the data source join is also performed on the fly and the output table is visible only by the user that is consuming the report, similarly to a calculation.

Like in the aggregated data source, calculated items and aggregated measures become regular measures in the output table and can be used in new aggregated items and aggregated measures expressions. Calculated items and aggregated values cannot be used as join keys, but if you derive aggregated source tables from then, they can be used as join keys.

The best way to understand data source join is through an example (see examples 2, 3, and 4).

For additional information on data source joins, please refer to the documentation.

Examples

Example 1 – Aggregated data source to improve performance

Those two crosstabs below are identical, but one of them is based on data coming from the original data source, and the other is built from the aggregated data source that has only selected columns Date by Month, Product Brand, Expenses, and Order Quantity.

03-Example 1: Aggregated data source to improve performance

The time required to query the data can be analyzed in the SAS Visual Analytics Diagnostics window (via Ctrl+Alt+P). In this example, the time dropped from almost 8 seconds to just a bit over 1 second.

04-Example 1: Aggregated data source to improve performance

The difference in performance varies from report to report, each case is different, and it’s not guaranteed that the use of aggregated data sources will always improve performance. For example, if the previous crosstabs only had Expenses as the measure (which is not an aggregated measure like the others), there wouldn’t be any significant performance gain:

05-Example 1: Aggregated data source to improve performance

Usually, reports that have multiple objects in a single page may experiment performance boosts if the objects can leverage the same aggregated data source that is a lot smaller than the original table. The performance improvement comes from the fact that the original data source table is filtered and aggregated once, instead of multiple times (for each object).

Example 2 – Data source join to bring data from multiple tables into one report object

Even though reports can have multiple data source tables, a report object can have its data coming from one source table only. The ability to join tables directly in SAS Visual Analytics helps overcome this limitation.

For example, the following data source table has actual sales numbers and the one below it has sales target numbers:

06-Example 2: Sales table

07-Example 2: Sales target table

To combine actual sales and target sales in one single report object you must join those two tables. You can do that upfront when preparing the data for reporting, or you can do that in SAS Visual Analytics, by creating a data source join:

08-Example 2: Data source join menu

In this example a right join was selected to make sure the target sales values are present even if there were no actual sales. Those two tables were joined by Product type, Quarter, and Year:

09-Example 2: Data source join definition

Clicking on the button Choose Columns… at the bottom left (picture above) allows you to select the columns you want in the joined output table:

10-Example 2: Data source join columns selection

The joined output table is now available on the Data pane and as any other data source table, you can apply data source filters (parameterized or not), join it with other tables, use it as the input of an aggregated data source, etc., and of course use both actual sales and target sales in the same report object:

11-Example 2: Output data source join

12-Example 2: Output data source join

If you have tables in different aggregation levels, for example: one table has daily transactional sales information at the product level and the other table has quarter sales goals at the product family level, you might want to aggregate the transactional sales table by creating an aggregated data source first, before joining it with the quarterly goals. There are other use cases that require a combination of aggregated data source and data source join, and this is what example #3 is about.

Example 3 – Aggregated data source and data source join to implement nested “select from”

Suppose you have a table with historic information about people, or products, etc. and certain conditions that those people or products might have been at a certain moment. Given one or more conditions, you want to retrieve all historic information from those people or products that are or were at that condition. That’s a very broad use case that could fit some similar use cases:

Patients and diagnosis: given a diagnosis, list all patients and their diagnosis that occurred together with the selected diagnosis.
Products and defects: given a defect, list all products and their defects that occurred together with the selected defect.
Accounts and transactions: given a type of transaction, list all accounts and their types of transaction that occurred with the selected type of transaction.
Managers and employees: given an employee, list all other employees that share the same manager of the selected employee.
etc.

If you had to write a SQL statement to resolve this query, it would be something like this:

Select * from <historic_table> where id in (select id from <historic_table> where condition in <selected conditions>)

In this example you have a table called TINYTABLE with three columns: ID (to identify the person or product), Condition, and Version (to indicate different moments). If condition B is selected, the goal is to return all records from ID’s 1 and 2 because they are the ID’s that contain condition B at some point:

13-Example 3: Input table

The first thing you need to do is resolve the inner select statement to identify the ID’s that meet the selection criteria. This is easily accomplished by creating an aggregated data source where only the column ID is included in the output:

14-Example 3: Aggregated data source to select IDs

Observe that the aggregated data source above has an advanced filter (top right) associated with it that filters for selected conditions only. In the filter expression, Selected Conditions is a multi-value parameter that could have been assigned to a List control object, for example. This is how the filter expression is defined:

15-Example 3: Advanced filter for aggregated data source

The next step is to join the list of affected ID’s (the aggregated data source) with the original source table by ID, to create a table called Data_Source_Join:

16-Example 3: Data source join with aggregated data source

The Data_Source_Join table contains the records that you are looking for and could be used in the report object (example when condition B is selected):

17-Example 3: Final output table

There is only one caveat in this solution: to guarantee that the data source join is executed every time a new condition is selected, we need to add a data source filter to the Data_Source_Join table. This filter is just to establish a direct dependency between the parameter called Selected Conditions and the final joined table, so that changes on the parameter triggers the join to be re-evaluated. This filter could be anything that uses the parameter and always resolves to true, such as this:

18-Example 3: Dummy filter for final output table

Example 4 – Cartesian joins

Cartesian joins can be dangerous, especially if you don’t know what you’re doing, but there are some cases where you might want or need to do that. Out of the box, cartesian join is not an option for the join type, but you can create a calculated item in SAS Visual Analytics with a constant value assigned to it and use that calculated item as the join key in an inner join. You will need to create that calculated item in both input tables.

But then there is a catch: calculated items cannot be used as join keys. You can go around this by creating aggregated data sources first and then joining those aggregated tables. If you remember from the beginning of this article, calculated items and aggregated measures behave as regular measures in the output table of an aggregated data source, but they are still dynamically calculated.

In the next and last article, you will work with an example that leverages most of the advanced aggregations and operations explained in this series and solve a complex calculation problem.

Additional resources on advanced calculations and aggregations:

Documentation: Working with Aggregated Data Sources
Documentation: Working with Data Source Joins in Reports
Community article: SAS Visual Analytics Advanced Calculations (part 1 of 4): AggregateCells by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 2 of 4): AggregateTable by SAS' Renato Luppi
Community article: SAS Visual Analytics Advanced Calculations (part 4 of 4): PMPM Example by SAS' Renato Luppi