We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Open the door to CAS using the Python client interface

by SAS Employee MarkThomas on ‎07-11-2017 10:35 AM (1,725 Views)

The announcement of SAS VDMML in September 2016 introduced the capability to interface with CAS using common third party languages such as Java, Lua and Python. As a result it's no secret that SAS Viya is expected to change the way customers interact with the SAS analytics engine (CAS). Of the supported languages, Python is the language that is surging in popularity and is most commonly associated with analytics. With this in mind, in this blog we will review what is required to establish a connection from Python to CAS while showing some basic interactions.

 

We will review two ways to interact with CAS using Python. The first method is to simply use the command line interface. Once invoked Python commands can be entered and executed interactively. The second method is to use Jupyter Notebook. Jupyter Notebook is a web application that allows you to create and share documents that contain live code, visualizations, equations and explanatory text. It is bundled with many other packages in an open source data science platform named Anaconda, which is powered by Python.

 

The following diagram reflects the components required to submit CAS actions from Python, either using Python typically delivered with the operating system or from an Anaconda installation. In the image below both Python environments exist on the same machine. However, they could be on separate machines. Or one or both could be configured on the Viya host (not shown) or on one of the CAS machines. The point is that the placement or origin of the Python session is flexible, as long as the host machine is on Linux. This could include the possibility of a Linux VM running on a Windows or Mac workstation.

Python and SWAT Deployment
Figure 1 - Python and SWAT Deployment

 

What is needed to use the command line?

Before we get started let's identify the items we need installed in order to access CAS from Python.

  • Python 2.7+ or 3.4+ on Linux
  • Shared library libnuma.so.1
  • pip or pip3
  • SAS Scripting Wrapper for Analytics Transfer (SWAT)
    • SWAT depends on these Python packages: pandas, pytz, numpy, six, requests, python-dateutil (these will be installed when SWAT is installed)

 

Python

Of course you will need Python. And accessing CAS from Python is only supported on Linux. Python 2.7+ and 3.4+ are supported versions. Python is typically installed as part of a base Linux system. Issue the following command to check if it is installed and verify the version.

 

[root@gatekrbhdp02 mnt]# rpm -q python
python-2.7.5-34.el7.x86_64

 

An alternative way to check if Python is installed is to simply enter the python command. This will start an interactive Python session.

 

[root@gatekrbhdp02 ~]# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Notice that the first line of output displays the version. Once verified, enter quit() to exit. If for some reason it is not installed, it can easily be installed using the following command as root or a user with admin privileges.

 

yum install python

 

Python can be on the same machine as the Viya components, CAS components, or on a machine independent of SAS software. However, if it is on a different machine, it must have network access to the machine where the CAS controller is located.

 

libnuma.so.1 library

The libnuma.so.1 shared library is required to make binary protocol connections to CAS. Like Python, in most cases this library will already be available. You can check using the "rpm" command.

 

[root@gatekrbhdp02 mnt]# rpm -q numactl
numactl-2.0.9-6.el7_2.x86_64

 

Also like python, if it is not already installed, it is easy to install using the "yum" command as root or a user with admin privileges.

 

yum install numactl

 

pip or pip3

Pip is the package manager for Python and is the tool used to install packages in Python. If Python 2.7+ is installed you will need pip and for Python 3.4+ you will need pip3. Starting with Python 3.4, pip3 is included by default with the Python binary installers. There may be a symbolic link for pip that point to pip3. Check to see if it is installed.

 

[root@gatekrbhdp02 mnt]# rpm -q python-pip
python-pip-7.1.0-1.el7.noarch

 

If it is not installed, install it via the "yum" command.

 

yum install python-pip

 

SWAT

The SAS Scripting Wrapper for Analytics Transfer, or SWAT, is the key package that enables Python to interact with CAS. The SWAT package is dependent upon six other packages, but the pip installer will check for these packages and install them automatically if they are not already installed.

 

It can be obtained from either from the download section of support.sas.com or from the SAS Git repository:

 

 

Once downloaded from either of these sites, the package can be installed using the pip command.

 

pip install swat-1.0.0-linux64.tar.gz

 

Time to test - Python command line

That's all there is to setting up Python to talk to CAS. To ensure all the pieces are in place we can run a simple program to interact with CAS. Initiate the command line interface by entering the python command.

 

[root@gatekrbhdp02 ~]# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Looking at the first line we see that the version matches the version displayed when queried the Python package. We are now ready to enter a test program. The following program will load the SWAT package into Python, establish a CAS session, read the cars.csv file from a Git repository and load the data into a CAS table, and finally display the first five rows of the CAS table. When copied and pasted into the command line interface, the commands and statements are interpreted and executed immediately.

 

>>> import swat
>>> portnumber=5570
>>> sess = swat.CAS('gatekrbhdp01.gatehadoop.com', portnumber, 'cas', '******', caslib="casuser")
>>> print (sess) CAS(u'gatekrbhdp01.gatehadoop.com', 5570, u'cas', protocol=u'cas', name=u'py-session-1', session=u'69d0080e-b453-5f4f-a7de-309c5a9fd91f')
>>> csvtbl = sess.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
>>> csvtbl.head()
Selected Rows from Table _T_2F53B70E_7F08458B7FE8Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945.0 33337.0 3.5 6.0 265.0 17.0 23.0 4451.0 106.0 189.0
1 Acura TL 4dr Sedan Asia Front 33195.0 30299.0 3.2 6.0 270.0 20.0 28.0 3575.0 108.0 186.0
2 Acura NSX coupe 2dr manual S Sports Asia Rear 89765.0 79978.0 3.2 6.0 290.0 17.0 24.0 3153.0 100.0 174.0
3 Audi A4 3.0 4dr Sedan Europe Front 31840.0 28846.0 3.0 6.0 220.0 20.0 28.0 3462.0 104.0 179.0
4 Audi A6 3.0 4dr Sedan Europe Front 36640.0 33129.0 3.0 6.0 220.0 20.0 27.0 3561.0 109.0 192.0

 

Notice the print command shows the CAS session information. And the display of rows reports the name of the table. We can also view session by reviewing User Session in the CAS Monitor. Notice the "Last Action" indicates table.fetch, which was driven by the csvtbl.head() method.

 

Python and SWAT Deployment
Figure 2 - CAS Session in CAS Monitor

 

That's all there is to it. A few steps and you can begin driving analytics in CAS from a Python session.  

 

What about Jupyter Notebook?

 

Now on to Jupyter Notebook. As noted earlier, Jupyter Notebook is a web application that provides a browser-based programming interface to Python. It is included with an open data science platform known as Anaconda. Anaconda includes its own copy of Python. There are two versions of Anaconda, one based on Python 2.7 and one based on Python 3.5. In short Python 2.x is legacy and Python 3.x represents the present and future of the language. For more information see this link. Based on this information the recommendation is to choose Anaconda3, which includes Python 3.x. Here's what is required:

 

  • Shared library libnuma.so.1 (package numactl)
  • Anaconda2 or 3 - the following are included depending on the version of Anaconda
    • pip or pip3
    • Python 2.7+ or 3.4+
  • SAS Scripting Wrapper for Analytics Transfer (SWAT)
    • Dependent Python packages: pandas, pytz, numpy, six, requests, python-dateutil - will be installed when SWAT is installed

 

libnuma.so.1 library

See the section above that checks for this library.  

 

Anaconda3

Installing Anaconda 3 is straightforward. Download and run the installer. The installer can be found at https://www.continuum.io/downloads. The fastest way to retrieve it is by using the wget command.

 

 

Once downloaded, install it using this command. There are several prompts: one for license agreement, one for install directory,

 

bash Anaconda3-4.1.1-Linux-x86_64.sh

 

Perform a quick check by starting the Python command line interface included with Anaconda and then quit to exit.

 

[root@gatekrbhdp01 ~]# /root/anaconda3/bin/python
Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

 

SWAT

The process of installing SWAT is nearly identical to what we performed earlier. Only this time it is necessary to reference the pip installer from Anaconda. This will add the SWAT package to Anaconda's package directory.

 

/root/anaconda3/bin/pip install swat-1.0.0-linux64.tar.gz

 

At this point the pieces are in place to start the web application. Although the install was performed as root, it is possible to start the web application as another account. Then by default the files created while testing Jupyter Notebook will be stored in the home directory of that account. In this example we use the sasdemo account to start Jupyter Notebook.

 

[sasdemo@gatekrbhdp02 ~]$ /opt/anaconda3/bin/jupyter notebook --no-browser --ip 0.0.0.0
[W 14:31:05.303 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 14:31:05.703 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 14:31:06.124 NotebookApp] [nb_anacondacloud] enabled
[I 14:31:06.129 NotebookApp] [nb_conda] enabled
[I 14:31:06.190 NotebookApp] nbpresent HTML export ENABLED
[W 14:31:06.190 NotebookApp] nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
[I 14:31:06.198 NotebookApp] Serving notebooks from local directory: /home/sasdemo
[I 14:31:06.198 NotebookApp] 0 active kernels
[I 14:31:06.198 NotebookApp] The Jupyter Notebook is running at: http://0.0.0.0:8888/
[I 14:31:06.198 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

 

Test via Notebook

You can see from the log shown above that the URL to access the Notebook application is http://0.0.0.0:8888. Obviously this will not work. Simply replace hostname with the actual hostname and paste the URL in a browser. You should be presented with a web page like the following.  

Jupyter Web App
Figure 3 - Jupyter Web App

 

Now we can open a new Notebook and enter our code. Select Notebook [Root] from the New dropdown box.

New Notebook
Figure 4 - New Notebook

 

This will open a window (cell) in which to enter code. Simply type or copy and paste your code into this window.  

Empty Cell
Figure 5 - Empty Cell

 

Code in Cell
Figure 6 - Code in Cell

 

Once you've added your code, select the "run cell" icon.mdt_25_python_08 If the execution is successful, the output will be sent back to the browser as seen below. Notice that windowing, color-coding and formatting greatly improve the readability.

Executed Code and Output
Figure 7 - Executed Code and Output

 

Like the Python command line, there are only a few steps to install Anaconda and related components and subsequently start the Jupyter Notebook application. Once started it is as easy to head down the road of generating CAS actions from Python and returning results to the Python session.  

 

Final Thoughts

If you made it this far you've discovered that getting started with Python, either via the command line or web interface, is a fairly simple task. Both of these environments address single user usage. The command line requires the user to authenticate to the Linux host where SWAT is configured, but Jupyter Notebook does not require authentication. This means that multiple users could connect to the Notebook application and their files would be visible to each other.

 

On the other hand, multi-user capability for Jupyter Notebook can also be configured by installing JupyterHub. JupyterHub combines a multi-user hub, an http proxy server and multiple single-user Jupyter notebook servers. Configuring JupyterHub allows users to safely isolate work and run on independent servers. In addition TLS can be enabled to ensure encrypted communication. This may be a topic for a future blog.

 

Documentation on SWAT usage can be found on both the Git page and SAS download URLs listed earlier. The documentation is saved as a zipped tar file which when extracted can simply be saved to a directory on a workstation or stored on a web server for access by users with access to that web server.

 

Associated links:

SAS® Visual Data Mining and Machine Learning (includes a link to Getting Started with SAS Viya for Python)

GitHub Sample Notebooks

Getting Started with SAS® Viya™ 3.2 for Python

SAS for Developers

 

Happy trails!

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.