This document is the first in a series that serve as a brief introduction to using SAS Viya for time series analysis. SAS Viya is run in a distributed computing environment on a Cloud Analytic Service (CAS) Server enabling high speed analytics from virtually anywhere. However, this means your data is also distributed in memory such that the order of the data is not preserved preventing the use of recursive time series techniques. PROC TSMODEL (Time Series Model) is a SAS Viya procedure that accumulates a time series at a user-defined interval from input timestamped data and executes user-defined programs on the data. PROC TSMODEL ensures only one process works on each unique accumulated time series to maintain the order of the data and calculations while still leveraging the power of distributed computing.
Timestamped data are recorded at no fixed interval and so must be accumulated over intervals such as daily, weekly or monthly. The time series may also be delineated based on values of the variables listed in the BY statement. Program statements are processed individually for each BY group in contrast to the SAS DATA step which processes data row by row.
In this document, we will cover the basics of setting up a CAS session and a library which will house the data and any models created. We will copy a dataset into the library and use PROC TSMODEL to accumulate the data into time series with BY groups and an ID variable. Finally, a new time series containing a time lagged version of the original data will be created. This will form the basis for creating more advanced models where the correlations between original and lagged data can be used to find patterns of seasonality, intermittency, trends and so forth. These additional topics will be covered in the subsequent papers in this series.
The first step in using PROC TSMODEL in SAS Viya is setting up a CAS session and linking a library to it. The two lines of code below show an example of creating a CAS session named ‘mycas’ and a library named ‘mylib’ linked to the ‘mycas’ session.
cas mycas; libname mylib cas sessref = mycas;
The code below shows how a data table can be copied from the sashelp library to the mylib library using a DATA step. A data table named ‘pricedata’ is created in the ‘mylib’ library and populated with the data from the ‘pricedata’ in the ‘sashelp’ library.
data mylib.pricedata; set sashelp.pricedata; run;
Next, the time series are accumulated from the input data table in PROC TSMODEL as shown in the first portion of the code below. The output data table, ‘timeSeries’, contains the unique time series listed in the VAR statements. Each of the variables, ‘sale’, ‘price’ and ‘discount’, are accumulated and either summed or averaged as specified over the interval specified in the ID statement into unique time series for each value of the BY variables. For example, the ‘sale’ variable contains unique time series of the sum of sales accumulated over each month for each combination of the values of ‘regionname’, ‘productline’ and ‘productname’ while the ‘price’ and ‘discount’ variables contained the average of the accumulated values. The ID for each row of values in the time series is the ‘date’ variable for the specified interval. It contains the month and year which range from Jan 1998 to Dec 2002. The ID variable repeats this sequence for each BY group.
The outarray table ‘newTimeSeries’ and the outarrays variable ‘lagSale’ are declared for creating a new data table that can be manipulated through CMP programming statements as shown in the next section.
proc tsmodel data = mylib.pricedata out = mylib.timeSeries outarray = mylib.newTimeSeries; by regionname productline productname; id date interval=month; var sale /accumulate=sum; var price discount /accumulate=avg; outarrays lagSale; *Programming statements described in the next section; submit; *create lag of sale variable; do i = 1 to dim(sale); if i = 1 then lagSale[i] = .; else lagSale[i] = sale[i-1]; end; endsubmit; run;
The outarray = mylib.newTimeSeries argument to PROC TSMODEL creates another data table that contains all the information from the ‘timeSeries’ table and the outarrays variable, ‘lagSale’, declared below the VAR statements. Now we are ready to execute some CMP programing statements on the time series in the ‘newTimeSeries’ outarray table. All programing statements appear between the submit and endsubmit keywords. In this example we want to create a time series containing the ‘sale’ variable with a lag of one month. First the ‘sale’ variable is iterated over with a DO loop and ‘lagSale’ is populated with the previous month’s value of sale except in the i=1 case where there is no previous month data available. In that case, ‘lagSale’ is set to a missing value.
The ‘mylib’ library now contains the ‘pricedata’, ‘timeSeries’ and ‘newTimeSeries’ tables. The ‘timeSeries’ table contains time series of the ‘sale’, ‘price’ and ‘discount’ variables accumulated monthly by date and either summed or averaged for each of the BY variables. The ‘newTimeSeries’ table contains a copy of the full ‘timeSeries’ table and the ‘lagSale’ variable which is the sale variable with a lag of one.
Future editions of this series of papers will demonstrate how to use accumulated time series data to evaluate correlation and seasonality using the TSA (Time Series Analysis) package and create models for forecasting using ATSM (Automatic Time Series Model) and TSM (Time Series Model) packages.