Do you want your control objects in your SAS Visual Analytics report to go from
to ?
Then read on. When a VA report is built on a very large CAS table and it has quite a lot of control objects like drop-down lists we can in some cases experience that it takes a little while for the control-objects to be populated with data as seen in the first animated gif above. Report user may accept to wait a short while for the main visualization in a report after they have selected their desired filters, but even a slight delay in the control objects quickly grows annoying for the report users as selecting multiple filters can take some time due to this small delay after they have selected one filter and before they can select the next one. I've seen this happen a few times where one has the very large table in SpeedyStore and use INPLACEPREFERRED in the CASLIB in combination with one-way or two-way automatics interactions and many control objects as each change in any of the control objects triggers a query for each and every other object on the page to fetch new data. For the best user experience we want to populate those filters as fast as possible, preferably without any noticeable waiting time for the user, and this tips shows how one can do so.
There are two methods we can use which can help reduce and possibly even eliminate the time spent displaying the "Loading" message. They are related as both seeks to minimize the data needed to populate the control objects and also to keep that data in CASDISKCACHE (Read: CAS memory) rather than down in SpeedyStore. I'll describe below how to do them both as well as their advantages and disadvantages.
Option 1:
Create one or several Aggregated tables in VA based on our main table.
Select some or all of the categories we use in our filters.
This will create the aggregated table when the report is opened and it will store the aggregated table in the CASUSER caslib.
When developing the report I like to just check the number of rows - easiest done by looking at the Frequency measure in a list table to ensure that the table isn't too large. If it's too large consider "edit the aggregated data" and remove some categories that are either the last the user will select and/or has the highest cardinality,
In this case the number of rows in the aggregated table is the same as what I have in my main table, so I remove the columns Product (due to the cardinality being 895K - the same as the number of rows in my main table) and TransactionDate (a filter i expect the user to use infrequently or to select last).
That reduced the number of rows in the aggregated table to 59,688. A significant reduction in number of rows and also only 10 columns vs 59 in the main table. Remember that in order to get a performance gain we need this aggregated table to be much smaller than the main report table.
Use the aggregated table(s) instead of our main report table to populate the control objects. Easiest way to do that is right-click on the control object and select "Remove all role assignments" and then make sure we have the aggregated table selected in the data panel and click "Assign data" on the control object and select the appropriate column from the aggregated table:
If we haven't renamed any of the columns in the aggregated table the mapping between the main report table and the aggregated table(s) will be done automatically. This can be verified in the datasource->Map data menu.
If we renamed some columns in the aggregated table we'll need to update the mapping accordingly. This mapping ensure that control objects populated by one table can filter the other table's data as long as the column being filtered is in the mapping. Since we didn't include the Product in our aggregated table we probably will still experience the "Loading" icon on the Product listbox, but despite that the Report as a whole feels much faster and responsive for the end user.
Advantages:
Relatively easy to do as it can be done from within Visual Analytics.
Using the control objects is much snappier and the report as a whole feels faster
Disadvantages:
Has a negative impact on startup time for the report. The report won't render until the aggregation is done and the aggregated table are stored in CASUSER.
Each concurrent user of reports using this technique will create their own copy of the aggregated table in their own CASUSER. If the aggregated table are also large and the number of concurrent user is high then this can possibly exhaust CAS memory.
Option 2:
A variation of option 1, but instead of using the Aggregate data functionality in Visual Analytics we'll use SAS Studio to create the aggregated table and load them once into one CASLIB where it can be read by all users. We will use the same CASLIB as the one where the main report table is.
We can easily create an aggregated table in SAS Studio using the Query step, selecting our columns and then selecting that we want distinct rows and we have places the output table in the same SpeedyStore library as our main table:
The next few steps are specific for the case where the main table is in a SpeedyStore CASLIB.
Generating the aggregated table and store it in the same SpeedyStore library/database and then load them to the CASLIB the normal way using INPLACEPREFERRED would probably give some performance gain due to the queries sent from VA to SpeedyStore for populating the control objects would execute on this smaller table. But the query would still need to be sent to SpeedyStore and the result would still need to be sent back to CAS/VA.
If the aggregated tables are quite small, we could get even better performance by loading the aggregated tables to the same CASLIB as the main report table while overriding the INPLACEPREFERRED option and use CASDISKCACHE instead. This means that the aggregated table with data would be copied to memory in CAS. Then we avoid having to send the query down to SpeedyStore to populate the control objects, it is all done in CAS. Here is a code example of how to load a physical copy of the data in the SpeedyStore table/view to a SpeedyStore CASLIB by using CASDISKCACHE as backingstore instead of the "INPLACEPREFERRED" backingstore that is the default for SpeedyStore CASLIBs.
proc casutil sessref=mysess incaslib="<CASLIB name>" outcaslib="<CASLIB name>";
droptable CASDATA="<Tablename>" quiet; /* drop session table or alternatively global table */
droptable CASDATA="<Tablename>" quiet; /* drop global table it still exists due to us having both an session and global table with same name */
load casdata="<Tablename>" CASOUT="<cas table name>" /* use the same name for casdata and casout */
DATASOURCEOPTIONS=(
backingStore="CASDISKCACHE"
)
promote;
quit;
When the aggregated table in SpeedyStore is loaded to CAS we can add it to the report with "Add data" and then assign the appropriate columns from the aggregated CAS table to the appropriate control objects. In this case there won't be an automatic mapping, so here we will need to map those categories ourselves in the "Map data" dialog box. This is easy if we have kept the same column names as we only need to click the + icon once for each category column in our aggregated table. It will automatically suggest the correct mapping if the column names are the same in both tables. We shouldn't map the Frequency and Frequency Percent columns.
Advantages:
No impact on startup time of the report as the aggregated table don't need to be recreated whenever the report is opened.
Only one copy of the aggregated table is kept in CAS memory even when there are many concurrent users of the report.
Using the control objects is much snappier and the report as a whole feels faster.
Disadvantages:
We must create the aggregated table outside SAS Visual Analytics by using SAS Studio.
Whenever the main table is updated we must also recreate the aggregated table.
... View more