VA Report Example: Moving 30 Day Rolling Sum

Started ‎10-17-2019 by
Modified ‎10-17-2019 by
Views 5,323

Recently, the requirement came to me for a moving 30 day sum. As a starting point, I needed to understand the business requirement, and nothing speaks louder than a visual.

Let’s use a straight forward time dimension to illustrate the requirement. Here we are looking at the all the days for the year 2011. The 30 day rolling sum requirement will be for the measure Day of Year and translates to:

• SUM(Day of Year) for 01Jan2011 to 30Jan2011 – start and end rows are highlighted in pink
• SUM(Day of Year) for 02Jan2011 to 31Jan2011 – start and end rows are highlighted in purple
• SUM(Day of Year) for 03Jan2011 to 01Feb2011 – start and end rows are highlighted in teal
• etc… till the end of the data.

To double check my 30 day rolling sum is returning the expected results, I added a few List Table objects to aggregate Day of Year and applied a filter for date to match the appropriate 30 day rolling window. I added a custom title to each List Table for clarification and highlighted the corresponding sums in matching colors.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Success! Now, let’s look at how I created the calculated measure 30 Day Sum. I used the aggregated tabular operator AggregateCells from the Calculated item window.

Next, I configured the AggregateCells operator:

• I selected the _Sum_ aggregation
• I assigned Day of Year as my measure to be aggregated
• I left the direction as default which for List Table objects will default to column
• For the starting point, I used the current cell position and move 0 cells, so basically start with the cell you are on
• For the ending point, I used the current cell position and move forward 29 cells for the aggregation. The direction is forward since 29 is positive, if I wanted to move backward it would have been -29. Also, keep in mind your computer science counting knowledge, as we start at zero and not 1, which is why the value isn't 30.

You may be having a light bulb moment right now. Why? I hope it is because you picked up on that I said the starting point and ending points are relative cell positions.

Why does this matter? Well, I did name this post “Moving 30 Day Rolling Sum” but, given what you've seen, does this aggregate measure take into account actual date values? No. The behavior of this operator is to aggregate based on the number of cells between the start and end points.

Key Point That means you will need to be conscious of how your data is aggregated in your report objects, if your data has uninterrupted contiguous dates and if you are okay with that aggregation behavior.

Mega Corp Example

In the previous example, I demonstrated the solution using a straight forward time dimension. My time dimension had one value per day and no days were missing.

Let’s look at my fictitious MegaCorp data. I want to point out that this company is closed on Sunday. Notice there are no entries for transactions on Sunday. If your business requirement is to aggregate a rolling 30 day sum for only the days the company is open, then you can implement the AggregateCells solution I outlined above. But be sure to only include the columns that aggregate up to the level for which you want the 30 cell aggregation applied; in other words, make sure your object aggregates up to the day level.

If, however; your data is inconsistent and you need an actual 30 day regardless if transactions occurred on those days, then you will need to perform a few more steps. Continue reading.

Here is the technique to implement a consecutive 30 day rolling sum if your data has gaps:

• Create a time dimension for the days you need to report on
• Use the report data join feature to left join the time dimension to your data source
• Create an aggregated measure using the AggregateCells operator for data join result
• Create a report object at the correct daily aggregation level for the 30 day rolling sum calculation.

Step 1: Create a time dimension for the days you need to report on

Since you are still reading, you must have gaps in your data. To remedy this, we are going to create a contiguous time dimension that we can join to your “gapy” data.

I reached out to my data management colleague, Stephen Foerster, who quickly pointed me to this helpful SAS Communities post: How to create date series from start and end dates.

Open SAS Studio V by using the Side Menu and select Develop SAS Code.

Next, copy and paste the code into a new program. I modified the code a little from the original SAS Communities post so that my results get promoted into my CAS library. You must have the write permission granted on the CAS library in order to promote tables.

Use the macros at the top for start_date and end_date to specify how large you want your time dimension to be to pair with your data. I included a month before and after my data source dates.

``````%let start_date=01Dec2008;
%let end_date=31Jan2012;

cas mySession sessopts=(caslib=casuser timeout=1800 locale="en_US");
libname VISUAL cas caslib="VISUAL";

data VISUAL.MegaCorpTimeDim;
do date="&start_date"d to "&end_date"d;
output;
end;

format date date9.;
run;

proc casutil outcaslib="VISUAL" incaslib="VISUAL";
promote casdata="MegaCorpTimeDim";
quit; run;

*cas mySession terminate;
``````

Step 2: Use the report data join feature to left join the time dimension to your data source

Use the Explore and Visualize option from the Side Menu to return to SAS Visual Analytics. Next, make sure your report has both your newly created time dimension table and the data you want for the 30 day rolling sum added as data sources.

Then, with your time dimension data source active, from the Data pane use the data source menu and select New data source from join...

Next, configure your data source join as shown below. We are performing a left join where the time dimension table is the left table and the MegaCorp table is the right. This is so that if you have any missing days in your data, those dates will be included in the join result. The join condition is where the date data items are equal and include all other data items from your source data.

Let’s verify our results. In the below report page, Verify Join, you can see that I have List Table objects for both the MegaCorp data source and the result of the left join named Data Source Join.

In the Data Source Join List Table object, you can see the leading days that come from the time dimension table but more importantly, you can also see entries for missing Sundays. Below the List Table objects, you can see that I’ve included the frequency for both data sources which essentially serves as my row count.

By creating a few calculated data items in the Data Source Join table, I can compare the number of rows in both the MegaCorp data source and the join result. Since I had a lead of 31 days in December and trail of 31 days in January in my time dimension, I expect to see an extra 62 days of rows plus the addition of the missing Sundays. Given there are 52 Sundays in a year, and I have 3 years’ worth of data, I would expect to see an additional 62 + 3*52 = 218 rows in the join result data. If you look at the bottom List Table object, you can see that I do indeed have a difference of 218 rows. I can feel confident that my join result is accurate.

Step 3: Create an aggregated measure using the AggregateCells operator for data join result

Now we can create our 30 Day Sum for Profit. This is configured similar to when we were looking at the time dimension example. From the Data pane with your data source join result table active, use the New data item menu and select Calculated item.

Configure the AggregateCells expression like before for Profit. Be sure to be mindful of the starting and ending point values.

Step 4: Create a report object at the correct daily aggregation level for the 30 day rolling sum calculation

When assigning roles to the List Table object, be sure to use the date data item that comes from the time dimension table, otherwise you’ll end up back where you started with gaps in your data.

I also applied a Heat map cell graph so that I can quickly see if any 30 day sum stands out from the rest. I cover both types of cell graphs, Heat map and Bar, in my post: Use SAS VA Cell Graphs to enhance List Tables which is now available in VA 7.5 too!

And similar to the time dimension example, I included a few List Table objects with filters to double check the calculations return what I expect.

Considerations

When using the AggregateCells operator, the major consideration is that it aggregates by cells and not by a dynamically assigned date/datetime period. Therefore, you must make sure that

• the data has the required continuous cells you need to aggregate by
• the report object aggregates to the level you want to aggregate the cells by.

The other consideration is the performance and table creation by-products of the report data join result. Recall from my original post that report data joins execute for every user, every time the report is opened, and that resultant table is stored in the user’s CASUSER CAS library. Stephen Foerster also covers careful considerations for using the report data join functionality in this article. You can avoid the report data join step by preparing your data to fill in any time gaps before it is loaded for use in Visual Analytics.

hello Terri, aggregate cells works very nice in a list table. If there is a request to visualize 30 days as a single sum in crosstab, how can you retrieve this rolling sum, so that business sees only the sum as a single value and not the whole table? regards Karolina

Hi Karolina,

If you want a metric to represent the aggregation of a measure for the last 30 days to be visualized in a Crosstab, you can create a new data item with a condition to check to see if the date falls into the 30 day window.

Here are some screenshots to help.

Thank you,

Teri