Many of you might be familiar with the Open Source Integration node in SAS Enterprise Miner (13.1 onward) that could execute R models. Similar functionality is available in SAS Viya as part of the Model Studio application in SAS Visual Data Mining and Machine Learning (8.3 onward), where you can execute both Python or R models using the Open Source Code node.
The Open Source Code node under the Miscellaneous group can run Python or R model that can subsequently be assessed and compared with other SAS, Python or R models in the Model Studio pipeline. A Model Studio pipeline allows you to perform a series of tasks, be it data preprocessing or feature engineering, supervised learning or predictive modeling, data postprocessing or model ensembles, followed by comparison of models in a directed process flow. These tasks, referred to as "nodes" in Model Studio, provide a large choice of statistical, data mining, machine learning, model interpretation and deployment techniques for analyzing your data. Note that SAS Visual Data Mining and Machine Learning is part of the SAS Viya platform that can handle large amounts of data using in-memory, distributed computing techniques.
Here is a simple example of comparing a forest model from SAS, Python and R using Open Source Code nodes that have been moved to the Supervised Learning group. The materials and steps necessary to run this example are available on GitHub while the data required can be downloaded from the UCI Machine Learning Repository, Default of credit card clients.
The above example trains and compares the following forest models from SAS, Python and R (left to right in Figure 1):
To use the Open Source Code node, Python or R must be installed on the same machine as the Compute server micro service. On Linux, the executable python or Rscript must be available in the system path. If you have multiple versions of Python or R on your Compute server, you can set a preferred version by modifying the PATH environment variable. This can be done by editing sas-compsrv file under /opt/sas/viya/config/etc/sysconfig/compsrv/default directory and adding the following line:
You also need to install any necessary packages with admin or sudo privileges so that they are accessible to all users. This example requires randomForest package in R and scikit-learn package (and its dependencies like numpy, scipy, pandas) in Python.
Note that this example is not about building the best model using any of these software packages, but to show how easy it is to try various algorithms available in Python or R within SAS Visual Data Mining and Machine Learning 8.3. Though not shown in this example, other preprocessing nodes like Feature Extraction, Filtering, Imputation, Transformations, Variable Selection etc. can be added as needed after the Data node and before any Open Source Code nodes in this pipeline.
Use this post to get started and try out the new Open Source Code node that can execute Python or R models. Additional resources including a video describing the Open Source Code node’s inner workings can be located below.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.