How to execute Python or R models using the Open Source Code node in SAS Viya

14 Likes

Many of you might be familiar with the Open Source Integration node in SAS Enterprise Miner (13.1 onward) that could execute R models. Similar functionality is available in SAS Viya as part of the Model Studio application in SAS Visual Data Mining and Machine Learning (8.3 onward), where you can execute both Python or R models using the Open Source Code node.

The Open Source Code node under the Miscellaneous group can run Python or R model that can subsequently be assessed and compared with other SAS, Python or R models in the Model Studio pipeline. A Model Studio pipeline allows you to perform a series of tasks, be it data preprocessing or feature engineering, supervised learning or predictive modeling, data postprocessing or model ensembles, followed by comparison of models in a directed process flow. These tasks, referred to as "nodes" in Model Studio, provide a large choice of statistical, data mining, machine learning, model interpretation and deployment techniques for analyzing your data. Note that SAS Visual Data Mining and Machine Learning is part of the SAS Viya platform that can handle large amounts of data using in-memory, distributed computing techniques.

Here is a simple example of comparing a forest model from SAS, Python and R using Open Source Code nodes that have been moved to the Supervised Learning group. The materials and steps necessary to run this example are available on GitHub while the data required can be downloaded from the UCI Machine Learning Repository, Default of credit card clients.

Files on GitHub

Figure 1: Compare forest model using Forest and Open Source Code nodes in Model Studio Figure 1: Compare forest model using Forest and Open Source Code nodes in Model Studio

The above example trains and compares the following forest models from SAS, Python and R (left to right in Figure 1):

Forest node in Model Studio
randomForest package in R
scikit-learn RandomForestClassifier in Python
scikit-learn RandomForestClassifier in Python where categorical inputs are one-hot encoded

To use the Open Source Code node, Python or R must be installed on the same machine as the Compute server micro service. On Linux, the executable python or Rscript must be available in the system path. If you have multiple versions of Python or R on your Compute server, you can set a preferred version by modifying the PATH environment variable. This can be done by editing sas-compsrv file under /opt/sas/viya/config/etc/sysconfig/compsrv/default directory and adding the following line:

export PATH=/path/to/your/python_or_r/bin/directory:${PATH}

You also need to install any necessary packages with admin or sudo privileges so that they are accessible to all users. This example requires randomForest package in R and scikit-learn package (and its dependencies like numpy, scipy, pandas) in Python.

Note that this example is not about building the best model using any of these software packages, but to show how easy it is to try various algorithms available in Python or R within SAS Visual Data Mining and Machine Learning 8.3. Though not shown in this example, other preprocessing nodes like Feature Extraction, Filtering, Imputation, Transformations, Variable Selection etc. can be added as needed after the Data node and before any Open Source Code nodes in this pipeline.

Use this post to get started and try out the new Open Source Code node that can execute Python or R models. Additional resources including a video describing the Open Source Code node’s inner workings can be located below.

SAS Visual Data Mining & Machine Learning 8.3 User’s Guide Reference Help: Open Source Code node

SAS Visual Data Mining & Machine Learning 8.3 User’s Guide

SAS Communities Library