In my previous blog I described different alternatives of how to use a predictive model developed in SAS Model Studio for batch scoring big data in SAS Viya. This blog will demonstrate how you can score these models from Python using the Jupyter notebook environment.
Model developers that prefer visual interfaces might work in SAS Model Studio to develop accurate machine learning models, while programmers can take advantage of the SAS SWAT interface to build their models in SAS Viya from a Python program. For this tutorial I will assume that the SAS Model Studio interface has been used to develop the machine learning model.
So, you developed your favorite predictive model in SAS Visual Data Mining and Machine Learning (SAS VDMML) using SAS Model Studio. Using the pipeline comparison facility, you decided which model won the model tournament and will be used to batch score new data. The video, How to compare models in SAS steps through this process. The figure below shows the selected champion model in the pipeline comparison tab of SAS Model Studio.
You now have 2 options to score your favorite model from the Python Jupyter notebook environment:
It is important to note, that with both approaches the model scoring will NOT be executed in the Python environment. The model publishing as well as the API endpoint both provide an integration mechanism to call the scoring process from a Python program and execute the scoring in the SAS Viya environment. This allows Python programmers to take advantage of the powerful SAS Viya processing. For this showcase we will use a model that provides scoring assets in a so-called analytical store or Astore. An Astore is a binary file that contains the state from a predictive analytic procedure. This state from a predictive analytic procedure, such as a random forest or gradient boosting, is created using the results from the training phase of model development. Astores can be created from predictive models developed in SAS VDMML or in SAS Enterprise Miner.
From the SAS Model Studio interface, you can use the publishing facility. In the Pipeline Comparison tab, select Publish Model from the overflow menu as shown in the figure below.
In the publishing wizard, select the publishing destination CAS for batch scoring and provide a name for the published model. The publishing creates an entry of the model in the destination table; by default, that table is called SAS_MODEL_TABLE. Publishing destinations are usually defined by your SAS Administrator in SAS Environment Manager. For more details. Please refer to the online documentation.
In order to score the model published to the CAS server from a Python program, a connection needs to be established from your Python environment to a running CAS server. This can be done using an authentication request API call as shown in the figure below.
In Jupyter notebook, we can now use the published model for batch scoring from Python using the CAS Actions “runModel” or “runModelLocal”. You need to provide the required parameters to the program according to your settings.
Running this code from Python will process the scoring of the input table in the SAS Viya environment and create the scored table. Both the input and the output table will be held in memory in the SAS Viya environment. In order to make the scored table available to other users or application in SAS Viya, it needs to be promoted.
As a second option, we can use an API endpoint that is created automatically for batch scoring and can be called from different front ends, such as SAS, Python or REST. In the Pipeline Comparison tab of SAS Model Studio, select Download score API from the overflow menu. Then choose Python as the front end.
Copy the provided Python code snippet into a program in Jupyter notebook and insert the required parameters to run the program.
Running this code in Jupyter notebook will trigger the execution of the scoring in SAS Viya and creates the scoring output table in the CAS environment.
Hopefully the examples in this blog demonstrated how easy it is to score Astore models in CAS from a Python program using Jupyter notebook.
Finally, I would like to thank my colleagues at SAS who helped reviewing and publishing this blog.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.