BookmarkSubscribeRSS Feed

Open the door to CAS using the Python client interface

Started ‎07-11-2017 by
Modified ‎03-30-2019 by
Views 5,807

The announcement of SAS VDMML in September 2016 introduced the capability to interface with CAS using common third party languages such as Java, Lua and Python. As a result it's no secret that SAS Viya is expected to change the way customers interact with the SAS analytics engine (CAS). Of the supported languages, Python is the language that is surging in popularity and is most commonly associated with analytics. With this in mind, in this blog we will review what is required to establish a connection from Python to CAS while showing some basic interactions.

 

We will review two ways to interact with CAS using Python. The first method is to simply use the command line interface. Once invoked Python commands can be entered and executed interactively. The second method is to use Jupyter Notebook. Jupyter Notebook is a web application that allows you to create and share documents that contain live code, visualizations, equations and explanatory text. It is bundled with many other packages in an open source data science platform named Anaconda, which is powered by Python.

 

The following diagram reflects the components required to submit CAS actions from Python, either using Python typically delivered with the operating system or from an Anaconda installation. In the image below both Python environments exist on the same machine. However, they could be on separate machines. Or one or both could be configured on the Viya host (not shown) or on one of the CAS machines. The point is that the placement or origin of the Python session is flexible, as long as the host machine is on Linux. This could include the possibility of a Linux VM running on a Windows or Mac workstation.

mdt_25_python_01.png
Figure 1 - Python and SWAT Deployment

 

What is needed to use the command line?

Before we get started let's identify the items we need installed in order to access CAS from Python.

  • Python 2.7+ or 3.4+ on Linux
  • Shared library libnuma.so.1
  • pip or pip3
  • SAS Scripting Wrapper for Analytics Transfer (SWAT)
    • SWAT depends on these Python packages: pandas, pytz, numpy, six, requests, python-dateutil (these will be installed when SWAT is installed)

 

Python

Of course you will need Python. And accessing CAS from Python is only supported on Linux. Python 2.7+ and 3.4+ are supported versions. Python is typically installed as part of a base Linux system. Issue the following command to check if it is installed and verify the version.

 

[root@gatekrbhdp02 mnt]# rpm -q python
python-2.7.5-34.el7.x86_64

 

An alternative way to check if Python is installed is to simply enter the python command. This will start an interactive Python session.

 

[root@gatekrbhdp02 ~]# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Notice that the first line of output displays the version. Once verified, enter quit() to exit. If for some reason it is not installed, it can easily be installed using the following command as root or a user with admin privileges.

 

yum install python

 

Python can be on the same machine as the Viya components, CAS components, or on a machine independent of SAS software. However, if it is on a different machine, it must have network access to the machine where the CAS controller is located.

 

libnuma.so.1 library

The libnuma.so.1 shared library is required to make binary protocol connections to CAS. Like Python, in most cases this library will already be available. You can check using the "rpm" command.

 

[root@gatekrbhdp02 mnt]# rpm -q numactl
numactl-2.0.9-6.el7_2.x86_64

 

Also like python, if it is not already installed, it is easy to install using the "yum" command as root or a user with admin privileges.

 

yum install numactl

 

pip or pip3

Pip is the package manager for Python and is the tool used to install packages in Python. If Python 2.7+ is installed you will need pip and for Python 3.4+ you will need pip3. Starting with Python 3.4, pip3 is included by default with the Python binary installers. There may be a symbolic link for pip that point to pip3. Check to see if it is installed.

 

[root@gatekrbhdp02 mnt]# rpm -q python-pip
python-pip-7.1.0-1.el7.noarch

 

If it is not installed, install it via the "yum" command.

 

yum install python-pip

 

SWAT

The SAS Scripting Wrapper for Analytics Transfer, or SWAT, is the key package that enables Python to interact with CAS. The SWAT package is dependent upon six other packages, but the pip installer will check for these packages and install them automatically if they are not already installed.

 

It can be obtained from either from the download section of support.sas.com or from the SAS Git repository:

 

 

Once downloaded from either of these sites, the package can be installed using the pip command.

 

pip install swat-1.0.0-linux64.tar.gz

 

Time to test - Python command line

That's all there is to setting up Python to talk to CAS. To ensure all the pieces are in place we can run a simple program to interact with CAS. Initiate the command line interface by entering the python command.

 

[root@gatekrbhdp02 ~]# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

 

Looking at the first line we see that the version matches the version displayed when queried the Python package. We are now ready to enter a test program. The following program will load the SWAT package into Python, establish a CAS session, read the cars.csv file from a Git repository and load the data into a CAS table, and finally display the first five rows of the CAS table. When copied and pasted into the command line interface, the commands and statements are interpreted and executed immediately.

 

>>> import swat
>>> portnumber=5570
>>> sess = swat.CAS('gatekrbhdp01.gatehadoop.com', portnumber, 'cas', '******', caslib="casuser")
>>> print (sess) CAS(u'gatekrbhdp01.gatehadoop.com', 5570, u'cas', protocol=u'cas', name=u'py-session-1', session=u'69d0080e-b453-5f4f-a7de-309c5a9fd91f')
>>> csvtbl = sess.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv')
>>> csvtbl.head()
Selected Rows from Table _T_2F53B70E_7F08458B7FE8Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945.0 33337.0 3.5 6.0 265.0 17.0 23.0 4451.0 106.0 189.0
1 Acura TL 4dr Sedan Asia Front 33195.0 30299.0 3.2 6.0 270.0 20.0 28.0 3575.0 108.0 186.0
2 Acura NSX coupe 2dr manual S Sports Asia Rear 89765.0 79978.0 3.2 6.0 290.0 17.0 24.0 3153.0 100.0 174.0
3 Audi A4 3.0 4dr Sedan Europe Front 31840.0 28846.0 3.0 6.0 220.0 20.0 28.0 3462.0 104.0 179.0
4 Audi A6 3.0 4dr Sedan Europe Front 36640.0 33129.0 3.0 6.0 220.0 20.0 27.0 3561.0 109.0 192.0

 

Notice the print command shows the CAS session information. And the display of rows reports the name of the table. We can also view session by reviewing User Session in the CAS Monitor. Notice the "Last Action" indicates table.fetch, which was driven by the csvtbl.head() method.

 

2.png
Figure 2 - CAS Session in CAS Monitor

 

That's all there is to it. A few steps and you can begin driving analytics in CAS from a Python session.  

 

What about Jupyter Notebook?

 

Now on to Jupyter Notebook. As noted earlier, Jupyter Notebook is a web application that provides a browser-based programming interface to Python. It is included with an open data science platform known as Anaconda. Anaconda includes its own copy of Python. There are two versions of Anaconda, one based on Python 2.7 and one based on Python 3.5. In short Python 2.x is legacy and Python 3.x represents the present and future of the language. For more information see this link. Based on this information the recommendation is to choose Anaconda3, which includes Python 3.x. Here's what is required:

 

  • Shared library libnuma.so.1 (package numactl)
  • Anaconda2 or 3 - the following are included depending on the version of Anaconda
    • pip or pip3
    • Python 2.7+ or 3.4+
  • SAS Scripting Wrapper for Analytics Transfer (SWAT)
    • Dependent Python packages: pandas, pytz, numpy, six, requests, python-dateutil - will be installed when SWAT is installed

 

libnuma.so.1 library

See the section above that checks for this library.  

 

Anaconda3

Installing Anaconda 3 is straightforward. Download and run the installer. The installer can be found at https://www.continuum.io/downloads. The fastest way to retrieve it is by using the wget command.

 

 

Once downloaded, install it using this command. There are several prompts: one for license agreement, one for install directory,

 

bash Anaconda3-4.1.1-Linux-x86_64.sh

 

Perform a quick check by starting the Python command line interface included with Anaconda and then quit to exit.

 

[root@gatekrbhdp01 ~]# /root/anaconda3/bin/python
Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

 

SWAT

The process of installing SWAT is nearly identical to what we performed earlier. Only this time it is necessary to reference the pip installer from Anaconda. This will add the SWAT package to Anaconda's package directory.

 

/root/anaconda3/bin/pip install swat-1.0.0-linux64.tar.gz

 

At this point the pieces are in place to start the web application. Although the install was performed as root, it is possible to start the web application as another account. Then by default the files created while testing Jupyter Notebook will be stored in the home directory of that account. In this example we use the sasdemo account to start Jupyter Notebook.

 

[sasdemo@gatekrbhdp02 ~]$ /opt/anaconda3/bin/jupyter notebook --no-browser --ip 0.0.0.0
[W 14:31:05.303 NotebookApp] Unrecognized JSON config file version, assuming version 1
[I 14:31:05.703 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 14:31:06.124 NotebookApp] [nb_anacondacloud] enabled
[I 14:31:06.129 NotebookApp] [nb_conda] enabled
[I 14:31:06.190 NotebookApp] nbpresent HTML export ENABLED
[W 14:31:06.190 NotebookApp] nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
[I 14:31:06.198 NotebookApp] Serving notebooks from local directory: /home/sasdemo
[I 14:31:06.198 NotebookApp] 0 active kernels
[I 14:31:06.198 NotebookApp] The Jupyter Notebook is running at: http://0.0.0.0:8888/
[I 14:31:06.198 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

 

Test via Notebook

You can see from the log shown above that the URL to access the Notebook application is http://0.0.0.0:8888. Obviously this will not work. Simply replace hostname with the actual hostname and paste the URL in a browser. You should be presented with a web page like the following.  

3.png
Figure 3 - Jupyter Web App

 

Now we can open a new Notebook and enter our code. Select Notebook [Root] from the New dropdown box.

3(2).png
Figure 4 - New Notebook

 

This will open a window (cell) in which to enter code. Simply type or copy and paste your code into this window.  

4.png
Figure 5 - Empty Cell

 

mdt_25_python_07_pswd_blur.png
Figure 6 - Code in Cell

 

Once you've added your code, select the "run cell" icon.mdt_25_python_08 If the execution is successful, the output will be sent back to the browser as seen below. Notice that windowing, color-coding and formatting greatly improve the readability.

mdt_25_python_10_pswd_blur.png
Figure 7 - Executed Code and Output

 

Like the Python command line, there are only a few steps to install Anaconda and related components and subsequently start the Jupyter Notebook application. Once started it is as easy to head down the road of generating CAS actions from Python and returning results to the Python session.  

 

Final Thoughts

If you made it this far you've discovered that getting started with Python, either via the command line or web interface, is a fairly simple task. Both of these environments address single user usage. The command line requires the user to authenticate to the Linux host where SWAT is configured, but Jupyter Notebook does not require authentication. This means that multiple users could connect to the Notebook application and their files would be visible to each other.

 

On the other hand, multi-user capability for Jupyter Notebook can also be configured by installing JupyterHub. JupyterHub combines a multi-user hub, an http proxy server and multiple single-user Jupyter notebook servers. Configuring JupyterHub allows users to safely isolate work and run on independent servers. In addition TLS can be enabled to ensure encrypted communication. This may be a topic for a future blog.

 

Documentation on SWAT usage can be found on both the Git page and SAS download URLs listed earlier. The documentation is saved as a zipped tar file which when extracted can simply be saved to a directory on a workstation or stored on a web server for access by users with access to that web server.

 

Associated links:

SAS® Visual Data Mining and Machine Learning (includes a link to Getting Started with SAS Viya for Python)

GitHub Sample Notebooks

Getting Started with SAS® Viya™ 3.2 for Python

SAS for Developers

 

Happy trails!

Comments

Hello,

 

I am using SAS Viya for Learners in JupyterLab platform.

It appears that I have established the connection to CAS.

Issue: I am unable to read the data file and generate description statistics.

I can't figure out what part of my code (appended below)  is wrong and how to fix it.

Any help would be appreciated.

Thanks,

 

 import os
 import sys
 import swat
 conn = swat.CAS(os.environ.get("CASHOST"), os.environ.get("CASPORT"),None,os.environ.get("SAS_VIYA_TOKEN")) 
In [12]:
out = conn.serverstatus()
 
NOTE: Grid node action status report: 1 nodes, 8 total actions executed.
In [13]:
out
Out[13]:
§ About
{'CAS': 'Cloud Analytic Services', 'Version': '3.04', 'VersionLong': 'V.03.04M0P07122018', 'Copyright': 'Copyright © 2014-2018 SAS Institute Inc. All Rights Reserved.', 'ServerTime': '2019-07-18T15:14:47Z', 'System': {'Hostname': 'pdcesx23045', 'OS Name': 'Linux', 'OS Family': 'LIN X64', 'OS Release': '3.10.0-693.el7.x86_64', 'OS Version': '#1 SMP Tue Aug 22 21:09:27 UTC 2017', 'Model Number': 'x86_64', 'Linux Distribution': 'CentOS Linux release 7.4.1708 (Core)'}, 'license': {'site': 'DEMOCENTER - Viya For Learners', 'siteNum': 70180938, 'expires': '05Dec2019:00:00:00', 'gracePeriod': 45, 'warningPeriod': 45, 'maxCPUs': 9999}}

§ server
Server Status  nodes actions0
18

§ nodestatus
Node Status  name role uptime running stalled0
pdcesx23045.exnet.sas.comcontroller0.10800
 

elapsed 0.000307s · sys 0.000297s · mem 0.312MB

In [14]:
#conn.help(actionset='simple');
In [ ]:
tbl = conn.read_csv('https://raw.githubusercontent.com/'
                             'sassoftware/sas-viya-programming/master/data/cars.csv',
                             casout='cars')
   
In [ ]:
conn.summary(table=tbl)
Version history
Last update:
‎03-30-2019 08:53 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags