BookmarkSubscribeRSS Feed

Creating Analytical Variables for Modeling Purposes Using Dynamic Aggregations From Timeseries

Started ‎11-18-2022 by
Modified ‎11-18-2022 by
Views 581

Preamble

 

This article is about a new SAS Custom Step available in the public GitHub SAS Custom Step repository called DAFT. DAFT stands for Dynamic Aggregations From Timeseries and supports the creation of analytical variables out of time series data with a click of a button for use in modeling and other analytical purposes. Take a look at the following animation showcasing a potential workflow to interact with DAFT. Don't forget to continue reading this article to learn more about the details.

 

Potential DAFT WorkflowPotential DAFT Workflow

 

 

A New Data Wrangling Tool

 

Creating a variety of analytical variables for modeling purposes, e.g., forecasting or prediction models is crucial for creating a good scoring algorithm. The Dynamic Aggregations From Timeseries DAFT SAS Studio Custom Step enables SAS Studio Flow users to easily perform dynamic aggregations on timeseries data by the push of a button.

 

What Does Dynamic Exactly Mean?

 

Let's explain based on an example: often times the outcome of events is dependent on other past events. So, it is important to get the historic views of the data, from say four weeks ago, seven weeks, etc. In addition, how does it look when aggregating data from two weeks, three weeks, etc?

 

Real life examples would be:

 

  • When looking at next product recommendations or purchase predictions, one important variable to look at is spending behavior, so questions like the following are important: What was the client behavior before they signed up for a credit card, e.g. how much did they spend over a month, three months before the sign up? And probably a good idea to feed in all kinds of combinations into the model and let the model decide which combination(s) are important for the desired outcome (dependent).
  • When looking at harvest predictions of specific crops, one important variable to look at is precipitation, so questions like the following are important: What was the precipitation over two weeks and six weeks before the harvest date? And often time it is not known which combination of aggregations and lag are important, so having a variety of options entering the model and letting AI make the decision which combination to use in the model can be crucial.

 

As mentioned in the above examples, often times it is not known which time parameters are relevant, hence it might be important to create many combinations and then let the statistic decide which combination is influencial.

 

DAFT allows us to calculate a large number of combinations, if necessary. At this point, DAFT allows the following aggregation functions:

 

  • sum
  • mean
  • min
  • max

 

The aggregations are based on one of the following time units:

 

  • day
  • week
  • month
  • quarter

 

With time series data usually being very granular, aggregating to higher level is necessary to allow best results for analytic purposes. Usually, it depends on the problem of which granularity to choose.

 

The output dataset is then made available based on that chosen granularity.

 

Looking at an DAFT Example Setting

 

Weather data is available on a minute basis, and the problem at hand requires us to look at the data on a weekly basis. It is required we look at the total precipitation over one-week and two-weeks for both four-weeks and eight-weeks ago. Translated into DAFT terms, this would mean:

 

  • The aggregation sequence that needs to be provided is: 1#2
  • The lag sequence that needs to be provided is: 4#8

 

DAFT then creates all combinations between aggregation and lag sequence and the output variables would look like:

 

  • precipitation_sum1L4
  • precipitation_sum1L8
  • precipitation_sum2L4
  • precipitation_sum2L8

 

with "sum" describing the statistic for the variable, the number behind it describes the length based on the selected unit, and "L" describing the lag.

 

Since the granularity is "By Week", DAFT creates the following two time-variables:

 

  • _DAFT_year
  • _DAFT_week

 

Additionally, the output dataset contains the variables that describe the entity. In the weather example, this could the region/county level, or zip code level, etc.

 

In other examples, e.g., when the transaction data is bank data, the smallest entity could be person, household, company or parent company.

 

And for SAS Viya 4, it is easy to get your hands on DAFT

 

DAFT is available as a custom step, which means it only needs to upload the “step” file somewhere in SAS Content (see detailed upload instructions here ), and it's automatically available under “Shared” Steps in SAS Studio.

 

The DAFT SAS Studio Custom Step can be downloaded here .

 

Interacting With DAFT

 

Following are a few screenshots of the user experience when using the custom step. Each screenshot shows one tab in the custom step.

 

Input Data Tab

 

The complete options are spread out over two screenshots:

The “Input Data” tab contains all the parameters needed to determine which variables are needed, in which role. 

 

DAFT Input Data Tab Part 1DAFT Input Data Tab Part 1

 

DAFT Input Data Tab Part 2DAFT Input Data Tab Part 2

 

 

Output Data Tab

 

Here the output granularity is determined.

 

DAFT Output Data TabDAFT Output Data Tab

 

Possible options are: 

 

  • By Day
  • By Week
  • By Month
  • By Quarter

 

Processing Options Tab

 

This is the tab where I can set which combination of aggregations and lags DAFT should produce.

 

DAFT Processing Options TabDAFT Processing Options Tab

 

Admin Options Tab

 

Here are all kinds of settings available that control process execution.

 

DAFT Admin Options TabDAFT Admin Options Tab

 

 

About Tab

 

This tab contains all the necessary information for using DAFT. It also contains a description of all the parameters and some sample code that produces an example transaction file to play around with DAFT.

 

DAFT About TabDAFT About Tab

 

 

Take a Look - DAFT in Action

 

DAFT in Action - take a look at the animation..DAFT in Action - take a look at the animation..

 

Where Can I Play around with DAFT?

 

Everything you need to run DAFT is available on Github here. This also contains a readme file with more information about all the parameters.

 

Final Thoughts

 

Please leave a comment and let me know what you think. Maybe you have some feature ideas? Also, share your experience with aggregating transaction table. I am curious to hear about your experience and can't wait to hear from you.

Version history
Last update:
‎11-18-2022 02:56 PM
Updated by:

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags