The announcement of SAS VDMML in September 2016 introduced the capability to interface with CAS using common third party languages such as Java, Lua and Python. As a result it's no secret that SAS Viya is expected to change the way customers interact with the SAS analytics engine (CAS). Of the supported languages, Python is the language that is surging in popularity and is most commonly associated with analytics. With this in mind, in this blog we will review what is required to establish a connection from Python to CAS while showing some basic interactions.
We will review two ways to interact with CAS using Python. The first method is to simply use the command line interface. Once invoked Python commands can be entered and executed interactively. The second method is to use Jupyter Notebook. Jupyter Notebook is a web application that allows you to create and share documents that contain live code, visualizations, equations and explanatory text. It is bundled with many other packages in an open source data science platform named Anaconda, which is powered by Python.
The following diagram reflects the components required to submit CAS actions from Python, either using Python typically delivered with the operating system or from an Anaconda installation. In the image below both Python environments exist on the same machine. However, they could be on separate machines. Or one or both could be configured on the Viya host (not shown) or on one of the CAS machines. The point is that the placement or origin of the Python session is flexible, as long as the host machine is on Linux. This could include the possibility of a Linux VM running on a Windows or Mac workstation.
|Figure 1 - Python and SWAT Deployment|
Before we get started let's identify the items we need installed in order to access CAS from Python.
Of course you will need Python. And accessing CAS from Python is only supported on Linux. Python 2.7+ and 3.4+ are supported versions. Python is typically installed as part of a base Linux system. Issue the following command to check if it is installed and verify the version.
An alternative way to check if Python is installed is to simply enter the python command. This will start an interactive Python session.
Notice that the first line of output displays the version. Once verified, enter quit() to exit. If for some reason it is not installed, it can easily be installed using the following command as root or a user with admin privileges.
Python can be on the same machine as the Viya components, CAS components, or on a machine independent of SAS software. However, if it is on a different machine, it must have network access to the machine where the CAS controller is located.
The libnuma.so.1 shared library is required to make binary protocol connections to CAS. Like Python, in most cases this library will already be available. You can check using the "rpm" command.
Also like python, if it is not already installed, it is easy to install using the "yum" command as root or a user with admin privileges.
Pip is the package manager for Python and is the tool used to install packages in Python. If Python 2.7+ is installed you will need pip and for Python 3.4+ you will need pip3. Starting with Python 3.4, pip3 is included by default with the Python binary installers. There may be a symbolic link for pip that point to pip3. Check to see if it is installed.
If it is not installed, install it via the "yum" command.
The SAS Scripting Wrapper for Analytics Transfer, or SWAT, is the key package that enables Python to interact with CAS. The SWAT package is dependent upon six other packages, but the pip installer will check for these packages and install them automatically if they are not already installed.
It can be obtained from either from the download section of support.sas.com or from the SAS Git repository:
Once downloaded from either of these sites, the package can be installed using the pip command.
That's all there is to setting up Python to talk to CAS. To ensure all the pieces are in place we can run a simple program to interact with CAS. Initiate the command line interface by entering the python command.
Looking at the first line we see that the version matches the version displayed when queried the Python package. We are now ready to enter a test program. The following program will load the SWAT package into Python, establish a CAS session, read the cars.csv file from a Git repository and load the data into a CAS table, and finally display the first five rows of the CAS table. When copied and pasted into the command line interface, the commands and statements are interpreted and executed immediately.
Notice the print command shows the CAS session information. And the display of rows reports the name of the table. We can also view session by reviewing User Session in the CAS Monitor. Notice the "Last Action" indicates table.fetch, which was driven by the csvtbl.head() method.
|Figure 2 - CAS Session in CAS Monitor|
That's all there is to it. A few steps and you can begin driving analytics in CAS from a Python session.
Now on to Jupyter Notebook. As noted earlier, Jupyter Notebook is a web application that provides a browser-based programming interface to Python. It is included with an open data science platform known as Anaconda. Anaconda includes its own copy of Python. There are two versions of Anaconda, one based on Python 2.7 and one based on Python 3.5. In short Python 2.x is legacy and Python 3.x represents the present and future of the language. For more information see this link. Based on this information the recommendation is to choose Anaconda3, which includes Python 3.x. Here's what is required:
See the section above that checks for this library.
Installing Anaconda 3 is straightforward. Download and run the installer. The installer can be found at https://www.continuum.io/downloads. The fastest way to retrieve it is by using the wget command.
Once downloaded, install it using this command. There are several prompts: one for license agreement, one for install directory,
Perform a quick check by starting the Python command line interface included with Anaconda and then quit to exit.
The process of installing SWAT is nearly identical to what we performed earlier. Only this time it is necessary to reference the pip installer from Anaconda. This will add the SWAT package to Anaconda's package directory.
At this point the pieces are in place to start the web application. Although the install was performed as root, it is possible to start the web application as another account. Then by default the files created while testing Jupyter Notebook will be stored in the home directory of that account. In this example we use the sasdemo account to start Jupyter Notebook.
You can see from the log shown above that the URL to access the Notebook application is http://0.0.0.0:8888. Obviously this will not work. Simply replace hostname with the actual hostname and paste the URL in a browser. You should be presented with a web page like the following.
|Figure 3 - Jupyter Web App|
Now we can open a new Notebook and enter our code. Select Notebook [Root] from the New dropdown box.
|Figure 4 - New Notebook|
This will open a window (cell) in which to enter code. Simply type or copy and paste your code into this window.
|Figure 5 - Empty Cell|
|Figure 6 - Code in Cell|
Once you've added your code, select the "run cell" icon. If the execution is successful, the output will be sent back to the browser as seen below. Notice that windowing, color-coding and formatting greatly improve the readability.
|Figure 7 - Executed Code and Output|
Like the Python command line, there are only a few steps to install Anaconda and related components and subsequently start the Jupyter Notebook application. Once started it is as easy to head down the road of generating CAS actions from Python and returning results to the Python session.
If you made it this far you've discovered that getting started with Python, either via the command line or web interface, is a fairly simple task. Both of these environments address single user usage. The command line requires the user to authenticate to the Linux host where SWAT is configured, but Jupyter Notebook does not require authentication. This means that multiple users could connect to the Notebook application and their files would be visible to each other.
On the other hand, multi-user capability for Jupyter Notebook can also be configured by installing JupyterHub. JupyterHub combines a multi-user hub, an http proxy server and multiple single-user Jupyter notebook servers. Configuring JupyterHub allows users to safely isolate work and run on independent servers. In addition TLS can be enabled to ensure encrypted communication. This may be a topic for a future blog.
Documentation on SWAT usage can be found on both the Git page and SAS download URLs listed earlier. The documentation is saved as a zipped tar file which when extracted can simply be saved to a directory on a workstation or stored on a web server for access by users with access to that web server.
SAS® Visual Data Mining and Machine Learning (includes a link to Getting Started with SAS Viya for Python)