Recently, the requirement came to me for a moving 30 day sum. As a starting point, I needed to understand the business requirement, and nothing speaks louder than a visual.
Let’s use a straight forward time dimension to illustrate the requirement. Here we are looking at the all the days for the year 2011. The 30 day rolling sum requirement will be for the measure Day of Year and translates to:
To double check my 30 day rolling sum is returning the expected results, I added a few List Table objects to aggregate Day of Year and applied a filter for date to match the appropriate 30 day rolling window. I added a custom title to each List Table for clarification and highlighted the corresponding sums in matching colors.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Success! Now, let’s look at how I created the calculated measure 30 Day Sum. I used the aggregated tabular operator AggregateCells from the Calculated item window.
Next, I configured the AggregateCells operator:
You may be having a light bulb moment right now. Why? I hope it is because you picked up on that I said the starting point and ending points are relative cell positions.
Why does this matter? Well, I did name this post “Moving 30 Day Rolling Sum” but, given what you've seen, does this aggregate measure take into account actual date values? No. The behavior of this operator is to aggregate based on the number of cells between the start and end points.
Key Point That means you will need to be conscious of how your data is aggregated in your report objects, if your data has uninterrupted contiguous dates and if you are okay with that aggregation behavior.
In the previous example, I demonstrated the solution using a straight forward time dimension. My time dimension had one value per day and no days were missing.
Let’s look at my fictitious MegaCorp data. I want to point out that this company is closed on Sunday. Notice there are no entries for transactions on Sunday. If your business requirement is to aggregate a rolling 30 day sum for only the days the company is open, then you can implement the AggregateCells solution I outlined above. But be sure to only include the columns that aggregate up to the level for which you want the 30 cell aggregation applied; in other words, make sure your object aggregates up to the day level.
If, however; your data is inconsistent and you need an actual 30 day regardless if transactions occurred on those days, then you will need to perform a few more steps. Continue reading.
Here is the technique to implement a consecutive 30 day rolling sum if your data has gaps:
Since you are still reading, you must have gaps in your data. To remedy this, we are going to create a contiguous time dimension that we can join to your “gapy” data.
I reached out to my data management colleague, Stephen Foerster, who quickly pointed me to this helpful SAS Communities post: How to create date series from start and end dates.
Open SAS Studio V by using the Side Menu and select Develop SAS Code.
Next, copy and paste the code into a new program. I modified the code a little from the original SAS Communities post so that my results get promoted into my CAS library. You must have the write permission granted on the CAS library in order to promote tables.
Use the macros at the top for start_date and end_date to specify how large you want your time dimension to be to pair with your data. I included a month before and after my data source dates.
%let start_date=01Dec2008; %let end_date=31Jan2012; cas mySession sessopts=(caslib=casuser timeout=1800 locale="en_US"); libname VISUAL cas caslib="VISUAL"; data VISUAL.MegaCorpTimeDim; do date="&start_date"d to "&end_date"d; output; end; format date date9.; run; proc casutil outcaslib="VISUAL" incaslib="VISUAL"; promote casdata="MegaCorpTimeDim"; quit; run; *cas mySession terminate;
Use the Explore and Visualize option from the Side Menu to return to SAS Visual Analytics. Next, make sure your report has both your newly created time dimension table and the data you want for the 30 day rolling sum added as data sources.
Then, with your time dimension data source active, from the Data pane use the data source menu and select New data source from join...
Next, configure your data source join as shown below. We are performing a left join where the time dimension table is the left table and the MegaCorp table is the right. This is so that if you have any missing days in your data, those dates will be included in the join result. The join condition is where the date data items are equal and include all other data items from your source data.
Let’s verify our results. In the below report page, Verify Join, you can see that I have List Table objects for both the MegaCorp data source and the result of the left join named Data Source Join.
In the Data Source Join List Table object, you can see the leading days that come from the time dimension table but more importantly, you can also see entries for missing Sundays. Below the List Table objects, you can see that I’ve included the frequency for both data sources which essentially serves as my row count.
By creating a few calculated data items in the Data Source Join table, I can compare the number of rows in both the MegaCorp data source and the join result. Since I had a lead of 31 days in December and trail of 31 days in January in my time dimension, I expect to see an extra 62 days of rows plus the addition of the missing Sundays. Given there are 52 Sundays in a year, and I have 3 years’ worth of data, I would expect to see an additional 62 + 3*52 = 218 rows in the join result data. If you look at the bottom List Table object, you can see that I do indeed have a difference of 218 rows. I can feel confident that my join result is accurate.
Now we can create our 30 Day Sum for Profit. This is configured similar to when we were looking at the time dimension example. From the Data pane with your data source join result table active, use the New data item menu and select Calculated item.
Configure the AggregateCells expression like before for Profit. Be sure to be mindful of the starting and ending point values.
When assigning roles to the List Table object, be sure to use the date data item that comes from the time dimension table, otherwise you’ll end up back where you started with gaps in your data.
I also applied a Heat map cell graph so that I can quickly see if any 30 day sum stands out from the rest. I cover both types of cell graphs, Heat map and Bar, in my post: Use SAS VA Cell Graphs to enhance List Tables which is now available in VA 7.5 too!
And similar to the time dimension example, I included a few List Table objects with filters to double check the calculations return what I expect.
When using the AggregateCells operator, the major consideration is that it aggregates by cells and not by a dynamically assigned date/datetime period. Therefore, you must make sure that
The other consideration is the performance and table creation by-products of the report data join result. Recall from my original post that report data joins execute for every user, every time the report is opened, and that resultant table is stored in the user’s CASUSER CAS library. Stephen Foerster also covers careful considerations for using the report data join functionality in this article. You can avoid the report data join step by preparing your data to fill in any time gaps before it is loaded for use in Visual Analytics.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.