In this article, learn how you can make it so much easier for your analytics developers (data scientists and data engineers) to access Cloud Analytics Services (CAS) actions using the Python swat package inside SAS Studio!
Facilitating this goes a long way in motivating rapid time to value, exploiting the seamless interoperability between Python and CAS runtimes.
Show folks a new way of doing something, and you'll receive more pushback than gratitude ;).
This sad but wise truth hits home when we observe response to recent enhancements within SAS Studio on Viya 4. New additions - the Python code editor and Proc Python - allow for greater integration and interoperability with Python. Easier process and data handoffs between SAS and Python runtimes open up many development opportunities in standalone programs as well as SAS Studio Flows and Steps.
But, what prevents users from taking advantage of the same?
Users can choose to transfer data from Python to either SAS datasets (in Compute), or CAS (in-memory) tables. The Compute side is easily accomplished through the SAS module out of the box. For Cloud Analytics Services (CAS), however, users need to explicitly connect using the python-swat package, and get uncertain and frustrated about some connection parameters. Common areas of angst are :
1. Hostname : "...do I provide the entire SAS Viya URL, or a specific endpoint? "
2. Authentication (password-based) : "Yikes! No way will I ever enter a password inside an editor that does not mask it!"
3. Authentication (token-based) : "Go to blazes! It's no fun having to make all those API calls and get a token!"
4. Secure Client-Server Communication : "What IS all this about SSL certificates? I need to get this from an admin. I'm not gonna bother!"
As a result, many users lose motivation and do not seek the opportunities offered by unified access to Python and SAS. Let's look at ways to make it easier for them.
We base our solution on a simple principle - developer personas should not worry about connection parameters. We use automation and environment variables available within a SAS Studio session to help us. Some awareness of SAS Viya's containerised architecture of SAS Viya comes in useful. Every SAS Studio session is initialised within a virtual 'machine' (called a pod), and this pod is based upon a "context" , i.e., information which specifies the conditions under which the pod operates. The context contains many environment variables (or, if they are more relatable, 'macro variables') and for this problem, there are two variables we find useful:
a. The SAS_SERVICES_TOKEN : This helps with token-based authentication. Even better, we have an alternative to password-based authentication which is a weaker authentication mechanism since it's more vulnerable. This environment variable contains an OAuth token which is used to call SAS Viya APIs. It's similar to the type of token you obtain using registered external clients, and has a validity and scope dependent on the SAS Studio session.
b. SSLCALISTLOC : The location of the SSL certificate within the pod. A trusted certificate establishes a handshake for secure communication between the Python runtime (remember, Python is considered as an external application to Viya) and SAS Viya. Usually, the administrator needs to extract this certificate (using kubectl or other tools) and make them available to data engineers and data scientists, who then have to remember to refer to this certificate in order to establish a CAS session. Using this environment variable, administrators can simply provide the client application the location of an SSL certificate. One less step that both administrators as well as users need worry about.
Let's get back to another source of confusion - what's the best way to refer to the hostname? It depends on how you choose to deploy CAS ports as external services. Two ports are exposed on the sas-cas-server-default-client Service, at port 5570 for the binary, and 8777 for the HTTPS service. You can directly use this service name and a desired port in your connection string. At the backend, this get translated to the IP of the pod which this service points to, but there's no need for users to worry about the same. Some administrators go further and publish sas-cas-server-default-bin and sas-cas-server-default-http services as NodePorts, or as Load balancers. These provide you additional options, and their names act as a reference to the underlying IP.
The final challenge - automate and avoid having users code all these steps! This is easily accomplished through good ol' autoexec - a set of commands which are run whenever a new SAS session is started. SAS Viya allows defining compute contexts which contains statements for autoexec.sas. Administrators code the following block inside a compute context and this leads to a single variable (conn) which is the CAS connection object that users can leverage straightaway. Here's the code in all its final glory !
# Import necessary packages
import swat, os
# Add certificate location to operating system's list of trusted certs.
# Connect to CAS
conn = swat.CAS(hostname="sas-cas-server-default-client",port=5570, password=os.environ['SAS_SERVICES_TOKEN'])
Note : the above is written as a simple Python script for clarity. Within an autoexec.sas, this needs to be wrapped inside, or referred by a proc python statement.
The user can open a new SAS Studio session and test the connection by simply executing the following Python script.
CAS('sas-cas-server-default-client', 5570, 'my-user-id', protocol='cas', name='py-session-1',
There you have it. You have either empowered your users with faster time to value, or, for some users, .... you have stimulated their creative minds to search for other excuses to avoid working on Python and CAS within SAS Studio !!!
Hopefully, more of the former. Cheers.
Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.
Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.