Implementing Out-of-Time Testing in Model Studio

The purpose of supervised machine learning is to generate accurate predictions for the future. Good models strive to achieve low training error, but it is just as important to achieve low generalization error. The training process needs to account for this compromise and make an honest assessment of the accuracy of the model. Assessing a candidate model on the data that is used to train the model would direct the algorithm to overfit to that training data. Instead of doing this, we normally take the historical data with known target, and split it into three partitions: training, validation, and test.

A major portion is used for fitting the model - the training data set.

Validation data is used to assess the model during training for the purpose of selecting variables and adjusting parameters. Validation data sets are instrumental in preventing overfitting. In lieu of a separate validation set (which might not be feasible for smaller data sets), SAS Viya Machine Learning offers k-fold cross validation through the Autotuning capability. Whether through a validation set or through k-fold cross validation, ensure that the training process assesses the error on data that is not used to train the model.

Test data is used at the end of model fitting to obtain a final assessment of how the model generalizes to new data. The reason for using test data (instead of validation data) is that validation data plays a role in the model training process. Hence, using validation data might lead to the same biased assessments as using training data. For this reason, a test data set should be used only at the end of the analysis and should not play a role in the model training process. For example, you make predictions for new data (say 2024) using a model trained, validated and tested on 2019–2023 data. This classic setup is illustrated below.

Select any image to see a larger version.
Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post

This is essentially an in-time testing approach that could be easily implemented in Model Studio while creating a Data Mining and Machine Learning project. You can either, choose to Create partition variable checkbox under the Advanced project settings and then specify the training, validation and/or test percentages, or, set a pre-existed partition variable within the data by specifying its Role as Partition.

By default, Model Studio uses the test data to select the champion model, if a test data set exists. Otherwise, Model Studio uses the validation data to select the champion model. If a validation data set does not exist, Model Studio uses the training data to select a champion.

Simple! Nonetheless.

“My team build a churn model for a telecommunication company where the churn rate was 10%. The model’s sensitivity (% of actual churners correctly predicted by the model) on test data was 72% but when they rolled out the model and tracked the result after 5 months, the sensitivity was dropped to 45%.”

Have you encountered such a problem? There could be many reasons for such a fallout. Perhaps doing an out-of-time testing would have helped identify the issue with the model before it was rolled out.

Out-of-time testing is the process of validating the model on the latest unseen data and capturing its performance to see whether there is any performance dip happening on the model prediction output. For example, you hold-out 2023, train the model using 2019–2022 data, and out-of-time test it on 2023 hold-out data. This out-of-time testing is illustrated below.

Holdout data is a capability that is new to Model Studio. Since test data was used in model comparison, using it again in comparing different pipelines might introduce bias. By setting aside data for the holdout partition, you allow for a further safeguard against generalization error. However, one of the biggest drawbacks of this approach is that you are not using the most recent data to make predictions.

Note that the holdout data set generally comes from a different time period than the training, validation and/or test sets and thus cannot be partitioned while creating project in Model Studio. The proof of a model's stability is in its ability to perform well year after year. A data set from a different time period, often called an out-of-time holdout data, is a good way to verify model stability, although such a holdout data set is not always available.

Once you have a holdout data you can score one or more models on it in Model Studio. To score holdout data, complete the following steps:

Select the three vertical dots icon in the upper right corner of the Pipeline Comparison tab and then select Score holdout data.

The Browse Data window appears. Select the data set that contains the holdout data that you want scored. Click OK. Model Studio will score the holdout data.

Once all the models are scored, click Close. To see the results of this process, use the Data menu below Pipeline Comparison to select Holdout.

This enables you to see the performance of the champion model (which was selected using the test data) on the holdout data to get an idea of expected performance (e.g., KS Youden) when this model is implemented in the field.

Close the Results. Switch your attention to the Insight tab by clicking on it.

The Insight tab provides model assessment results on holdout data along with other partitions.

Some analysis complications might introduce while interpreting these model assessment results. For example, if you have selected Enable event-based sampling, you must use model performance measures (like KS Youden and ROC curve) that are not drawn from the outcome proportions. Note that Model Studio uses the score code while scoring holdout data. The score code contains a section titled Adjust Posterior Probabilities. This code block modifies the posterior probability by multiplying it by the ratio of the actual probability to the event-based sampling values specified previously. Most model fit statistics, especially those related to prediction decisions (like misclassification rate), and some of the assessment plots (like cumulative lift) are closely tied to the outcome proportions in the training data. If the outcome proportions in the training, validation and test samples do not match the outcome proportions in the scoring holdout sample, model performance can be greatly misestimated.

So, what is an effective out-of-time testing approach in Model Studio? You might want to follow the following process for implementing out-of-time testing in Model Studio.

Find the best model based on Test data using Model Comparison node and/or Pipeline Comparison tab functionalities in Model Studio. You can also use only two partitions: train and validate instead of using three: train, validate and test.
Use holdout data to find the best model based on the out-of-time sample.
If the model based on holdout is different and if you feel it to be compelling enough to be chosen as the best model you can choose the best model from the comparison screen manually and pass it over to SAS Model Manager for scoring.

Find more articles from SAS Global Enablement and Learning here.

Implementing Out-of-Time Testing in Model Studio

Registration is open

SAS AI and Machine Learning Courses