Distributed and Parallel Hyperparameter Tuning of RNN Forecasting Models in the TSMODEL procedure

1 Like

Recurrent neural network (RNN) models, such as vanilla RNN, LSTM, and GRU, are widely used in a variety of areas. They are also well suited for time series forecasting because of their cyclic features. SAS® Visual Forecasting supports recurrent neural network models through the time series neural network (TNF) and time series model (TSM/ATSM) packages of the TSMODEL procedure. RNN models have a lot of hyperparameters, which you must tune properly to get the best-performing model. However, hyperparameter tuning usually requires substantial computing resources.

The TSMODEL procedure on the SAS® Viya® platform provides a very efficient way to run multiple tasks in parallel. Further, the INSCALAR= data option of the procedure enables easy customization and distribution of each parallel task’s input.

This blog shows how the procedure makes RNN hyperparameter tuning extremely efficient by trying many hyperparameter combinations in as little time as it takes to fit just one model.

The By-Group Processing of the TSMODEL Procedure

The following example shows three simple RNN forecasting models and each model has a different type of recurrent network structure. You can run each model separately in a worker node.

The run time is 2.54 seconds for RNN, 2.65 seconds for LSTM, and 2.96 seconds for GRU.

Let us see how the by-group processing of the TSMODEL procedure works in parallel with different RNN model types.

Suppose you have multiple worker nodes and for simplicity, use a single thread for each node. You want to build three different forecasting models in parallel.

You can use the by-group processing of the TSMODEL procedure.

The following example code shows that three recurrent neural network forecasting models run in parallel.

The log shows that the runtime is 2.92. The run time is almost the same as that when you run a single RNN forecasting model.

How does this work?

The time series data is distributed to each worker node and each worker node has its own script, in other words, each worker node has a different model specification.

So each time series model on each node runs in parallel.

The INSCALAR Data Option of the TSMODEL Procedure

The parameter values for each by-group processing can be set easily through the INSCALAR data option as follows

First, you need to make a data set that contains the parameter values for each by-group level.

In this example, the parameter variable name is “rnntype” and its values are RNN, LSTM, and GRU.

Then you set the data set to the “INSCALAR = “ in the procedure statement and you need to list your parameter variable name in the INSCALAR statement.

Next, you use the variable name as the value of the model option specification. You also need the “byid” variable name in the BY statement.

Distributed and Parallel Hyperparameter Tuning on RNN Forecasting Model

In recurrent neural network forecasting models, it is not easy to get a best-performing model without auto-tuning hyperparameters.

The auto-tuning is usually conducted by fitting many models with different sets of parameter specifications, so distributed and parallel computing is essential for auto-tuning.

You can apply the same schema of the TSMODEL procedure as that in the previous section for auto-tuning the RNN model hyperparameters.

In other words, you can do distributed and parallel auto-tuning easily by using the by-group processing with the INSCALAR data option.

Let us consider the following set of hyperparameters as an example.

There are five hyperparameters: ninput( size of the training input data window), nlayer(number of network layers), nneuron(number of neurons in each hidden layer),

learningRate(the learning rate in SGD), and algorithm(optimization algorithm).

Since each parameter has three distinct values, you have a total of 234 combination sets of hyperparameters, you fit 234 RNN models to find the best-performing RNN forecasting model.

The macro, %MakeHyperParameterDataSet provides an easy tool to make the INSCALAR data from the specified hyperparameter set.

The partial view of the table created by the %MakeHyperParameterDataSet is shown below.

Next, you can run the %MakeTuningDataSets macro to create the final INSCALAR data set and the modified training data set for auto-tuning the hyperparameters.

The macro also creates the _ID_ variable for both output data sets. The _ID_ is a reserved variable name for auto-tuning when you use the macro.

%MakeTuningDataSets(InData = /*training input data */,

InParms = /*user specified tuning parameter set*/,

ByVars = /*by group variable names */,

OutParms = /*output set for the INSCALAR option*/,

OutData = /*modified training data for auto-tuning*/

);

The output OutParms data will be the same as the input InParms data when the ByVars is not used.

Once you have the INSCALAR data for tuning parameters and the modified training data, you are ready to write the hyperparameter auto-tuning code for your RNN forecasting model as follows.

Set the Outdata= data to the DATA= option.
Set the Outparms= data to the INSCALAR= option.
Add the _ID_ variable to the BY statement.
Add the hyperparameter variables to be tuned to the INSCALAR statement.
Use the hyperparameter variable names, instead of using their specific values, at the SetOption() and the SetOptimizer().

The important steps in the TSMODEL procedure are highlighted in the example code below.

The 234 models are executed in parallel. As you see the notes in the log, the outtnfstat data contains 234 observations, each observation represents a set of fit statistics for a model.

So you are sure that all the 234 models are run and the total run time is 15.20 seconds which is the longest one-model runtime among the 234 models.

As a final step of auto-tuning, you can run the %SelectBestRnnModel macro to retrieve the best-performing model among the 243 candidate models.

%SelectBestRnnModel (

in_outtnfstat = mycas.outtnfstat,

in_outtnfopt = mycas.outtnfopt,

in_scalardata = mycas.inscalar_data_set,

in_outtnf = mycas.outtnf,

selection_region = FIT,

selection_stat = ptvlderror,

byvars = ,

best_model_parameter = best_model_parameter,

best_outtnfstat = best_outtnfstat,

best_outtnfopt = best_outtnfopt,

best_outtnf = best_outtnf

);

In this example, model id 106 is selected as the best-performing model and the tuned hyperparameter values are shown below.

The training and validation errors are shown for the selected model as follows.

You can also use %RnnForecastPlots macro to generate the optimization history plot and the forecast plot for the selected model.

Add RNN network types to the hyperparameter set

The TSMODEL procedure provides three RNN network types for a forecasting model, which are vanilla RNN, LSTM, and GRU.

You can add the RNN type to the hyperparameter set, then you can run the three RNN-type models concurrently including the hyperparamter auto-tuning set in the previous section.

In this case, the %MakeHyperParameterDataSet macro creates 729 (=234x3) combinations of hyperparameters.

You can run the 729 models in parallel to choose the best-performing model.

As you see the runtime in the log, the time is almost the same as that in the previous example with only one RNN type (LSTM).

The runtime of 234 LTSM models is 15.20 seconds and the runtime of 729 three different RNN type models is 14.82 seconds.

This shows that the distributed and parallel hyperparameter tuning works well at scale in the TSMODEL procedure.

With three different RNN model types, the GRU model id 289 is selected as the best-performing model.

The tuned hyperparameter values and the training and validation errors are shown for the selected model below.

The validation errors are smaller than those in the previous example.

This means the better model has been selected by adding 243 vanilla RNN models and 243 GRU models to the auto-tuning pool without requiring more computing resources.

An Auto-Tuning Example with Already Existing By-Variables in the Input Data

The utility macros can handle the auto-tuning with the already existing by-variables. Here is an example using the price data in the SASHELP library.

The data set contains three by-variabes: regionName, productLine, and productName.

The macro, %MakeTuningDataSets in the example creates the final input data sets, mycas.inscalar_data_set and mycas.traindata, for auto-tuning.

The mycas.inscalar_data_set contains the existing by-group variables and its partial table view is below.

The sashelp.pricedata does not have all the price data series for all the combinations of by-variable levels. The data set has only 17 price data series for some combinations of by-variable levels.

Therefore, the number of total models that you expect for auto-tuning is 12,393 (= 17 x 729).

As you see in the log, the number of observations in the OUTTNFSTAT shows the total number of trained models. The total runtime is only 12.66 seconds for training the 12,393 models.

The examples in this blog are run in the SAS RDCGRD which has 124 nodes, each node has at least 32 threads and some nodes have 80 threads.

Suppose we have an average of 50 threads per node, then you can run 6,200(=124x50) models concurrently. The runtime depends on your hardware configuration.

All the macros and example codes are available on the following SAS software GitHub:

https://github.com/sassoftware/sas-viya-forecasting

Special thanks to Thiago Quirino and Mahesh Joshi for their suggestions and discussions.

Distributed and Parallel Hyperparameter Tuning of RNN Forecasting Models in the TSMODEL procedure

Free course: Data Literacy Essentials

Get Started