SAS Visual Forecasting 8.1 is a new forecasting solution on SAS Viya™ and it leverages the power of Cloud Analytic Services (CAS) architecture for time series forecasting in large scale. The TSMODEL procedure is the engine that runs user-defined programs on a CAS server. You can write your own programs to manipulate, model and forecast time series using the scripting language and the packages available in Visual Forecasting. Each package, including the Automatic Time Series Model (ATSM) package, has a set of objects. These objects are the building blocks of your programs, and they can be used to create dynamic modeling and forecasting workflows for relevant time series. In this blog, I’ll provide a sample code snippet to illustrate the concepts, and demonstrate how you can utilize the ATSM package.
An Overview of the Automatic Time Series Model (ATSM) Package
The ATSM package provides objects to support automatic time series modeling and forecasting. Each object is designed to carry out a particular task in the process of time series modeling. With the ATSM package, you typically execute the following steps during the modeling and forecasting process:
The following objects are available in ATSM package to define the process:
The collector objects collect the parameters from the DIAGNOSE, DIAGSPEC, SELSPEC, and FORENG object instances. You may find some examples of output tables saved by these collector objects in the sample code snippet provided below.
The repeater objects replay the diagnostic control options to DIAGNOSE object instance, model selection graph specifications or parameter estimates to FORENG object instance. Each object has one or more methods that can be used to initialize and configure the object instance, set input and output options, and receive attributes of the object instance. The object method has one or more parameters. You can set the parameter value within the method call statements in your programs; the default values of those parameters are used for executions if method calls have no inputs.
Status Return Codes (designated rc in method usage statements) are numeric values returned when a method associated with an object instance is called. These codes can help determine if the method executed successfully and are defined as follow:
For experienced SAS users, the TSMODEL procedure and ATSM package are designed to provide functionality and capabilities that are similar to what is available with the following SAS 9.4 procedures:
Other functionalities equivalent to HPFARIMASPEC, HPFESMSPEC, ARIMA, and ESM are supported by TSM (Time Series Models) package, which will be covered in a future article.
Using ATSM Package for Automatic Time Series Modeling and Forecasting
Now, to use ATSM objects with the TSMODEL procedure, you can follow the steps listed below:
It is quite simple and intuitive to use the ATSM package for automatic time series modeling and forecasting. With that being said, let me show you a sample code snippet that creates a sales forecast for the price dataset that is available in the SASHELP library.
/* this script illustrates the use of atsm package to diagnose the time series and select the best model to generate the final forecasts */ /* create a cas session and a cas library */ cas mycas; libname mylib cas sessref = mycas; /* load pricedata table into a cas table */ data mylib.pricedata; set sashelp.pricedata; run; /*use tsmodel procedure, atsm package, and scripting language statements to automatically model and forecast time series */ proc tsmodel data = mylib.pricedata outobj = ( outFor = mylib.outFor outEst = mylib.outEst outStat = mylib.outStat modInfo = mylib.modInfo outSel = mylib.outSel ); by regionname productline productname; id date interval=month; var sale /acc = sum; var price/acc = avg; /*use atsm package */ require atsm; submit; /*declare atsm objects */ declare object dataFrame(TSDF); declare object diagnose(DIAGNOSE); declare object diagSpec(DIAGSPEC); declare object forecast(FORENG); declare object outFor(OUTFOR); declare object outEst(OUTEST); declare object outStat(OUTSTAT); declare object modInfo(OUTMODELINFO); declare object outSel(OUTSELECT); /*setup dependent and independent variables for the data frame*/ rc = dataFrame.initialize(); rc = dataFrame.addY(sale); rc = dataFrame.addX(price,'required','no','extend','stochastic'); /*setup time series diagnose specifications */ rc = diagSpec.open(); rc = diagSpec.setArimax('identify', 'both'); rc = diagSpec.setEsm('method', 'best'); rc = diagSpec.setTransform('transform', 'auto'); rc = diagSpec.close(); /*diagnose time series to generate candidate model list*/ rc = diagnose.initialize(dataFrame); rc = diagnose.setSpec(diagSpec); rc = diagnose.run(); /*run model selection and forecast */ rc = forecast.initialize(diagnose); rc = forecast.setOption('criterion','rmse'); rc = forecast.setOption('lead', 12, 'holdoutpct', 0.1); rc = forecast.run(); /*collect forecast results */ rc = outFor.collect(forecast); rc = outEst.collect(forecast); rc = outStat.collect(forecast); rc = modInfo.Collect(forecast); rc = outSel.Collect(forecast); endsubmit; run;
In a summary, this code requires the ATSM package and utilizes a number of instances from TSDF, DIAGSPEC, DIAGNOSE, FORENG objects. It generates five output tables using the collect method of the FORENG object instance.
The process flow is as follows:
It is noteworthy that this code snippet only takes one pass through the input data to complete the time series analysis, which is a significant performance advantage in the context of big data. It would take 4 passes through the input data when using SAS High Performance Forecasting (HPF) products on SAS 9.4.
The output of the collected objects are saved in CAS tables. They can be reviewed for forecasting validation and improvement; some of the tables, such as OUTDIAG, OUTFMSG, and OUTEST, can also be used as inputs to the repeater objects in user-defined programs to customize the time series analysis workflows. The corresponding repeater objects are INDIAG, INFMSG, and INEST.
Let us take a look at the output tables. Below is a description of the available output tables, with a screen capture of the first couple of rows for each table.
The model Information table (mylib.modInfo) includes information about the selected model.
The forecast table (mylib.outFor) includes the forecast results with the STD and UPPER & LOWER values for each time series.
The estimate table (mylib.outEst) includes the estimates of the model parameters for each time series. Model forms are included in _LABEL_ column.
The statistics table (mylib.outStat) includes the model fit statistics for each time series.
The selection table (mylib.outSel) includes the model selection statistics of specified model families for each time series.
Users without a lot of experience in time series forecasting can start with ATSM package to automatically model and forecast time series. By following the process that is outlined in the sample code snippet above, users can perform multiple tasks: they can set the range of certain parameters in the setArimax and setEsm methods of a DIAGSPEC object instance; and they can trigger the DIAGNOSE and FORENG object instances to automatically identify the model forms, fit multiple models and select the best model; and finally, generate the time series forecast utilizing the selected model.
Visual Forecasting 8.1 provides time series packages such as ATSM and TSM that can be included in user-defined programs. The objects in these packages are the building blocks of your programs and can be combined with a scripting language. The ATSM package and PROC TSMODEL procedure provide a structured process through which you can create automatic, dynamic and efficient workflows. The TSM package enables you to create custom models that can be combined in the workflows with the ATSM objects. I’ll cover how you can use the TSM package to create and tune time series models in a future blog. In my next blog, I will feature the open architecture capability of CAS and Visual Forecasting 8.1 and demonstrate how to use the open programming interfaces including Python to run time series analysis on a CAS server.