BookmarkSubscribeRSS Feed

Better Open-source Integration through the Python - Load Objects to SAS Custom Step

Started ‎11-09-2023 by
Modified ‎11-09-2023 by
Views 512
Analytics developers require flexible and integrated pipelines where they can access all available tools for their needs.  
 
Sometimes, developers may wish to use methods and functionality from open-source languages such as Python and R.   SAS Viya provides access to these languages through its integration with open source and specifically, in SAS Studio, through procedures like Proc Python.  
 
A challenge is that Python and SAS compute engines operate in their separate environments.  Data needs to be transferred among those environments in a seamless manner in order to take full advantage of integration.   A new contribution - the Python - Load Objects to SAS Custom Step  helps facilitate this data exchange.  
 
To illustrate, once your Python program has done its job, you can transfer desired data objects to a SAS Viya in-memory environment (or the SAS Programming Runtime Environment (also known as SPRE)) for accessing specific functionality or better performance.  At the same time, you can easily free up memory taken up by these objects within Python.
 
 
Python - Load Objects to SAS.gif
 
 
An important note:  this step may be new, but similar capability has existed for a while in the form of the SAS object in Proc Python.  This custom step extends such functionality.  Those who are familiar with the SAS object and the SAS.sd2df (SAS dataset to Data Frame) method are free to continue using them (and it's also used within this Custom Step).  The additional benefits provided by this custom step are:
 
1. It provides a low-code wrapper around the data exchange process and makes it more transparent instead of buried in code.
 
2. Provides additional options for more Python data objects beyond pandas data frames, such as single objects (strings and integers), lists, and dictionary objects.
 
3. Promotes good memory and object management by deleting the Python object after transfer and running garbage collection on the same
 
Those who are new to the paradigm of programming with SAS and Python in a combined fashion will find this step a useful aid in development of their programs.
 
 
Access and Use
 
This custom step is part of the  SAS Studio Custom Steps GitHub repository,  a collection of low-code components providing a productive and enjoyable developer experience.  These steps provide a user interface for entering parameters, abstraction and enable code reusability for many analytics tasks and programs.
 
Access the "Python - Load Objects to SAS" step from:
 
Link to the repository folder: Python - Load Objects to SAS
Link to the README: README
 
 
 
To use this within SAS Studio within a SAS Viya environment, a recommendation is to follow instructions to upload a selected custom step to SAS Viya.  Another alternative is to make use of Git integration functionality already available in SAS Studio.  Clone the SAS Studio Custom Steps GitHub repository and make a copy of required custom steps in your SAS Content folders.  Refer this post for some useful tips. 

 

 

Application

 
The most common scenario where developers are expected to use this step would be to transfer Pandas data frames to a corresponding SAS table (either a sas7bdat dataset or an in-memory table in SAS Cloud Analytics Services (CAS)).  Pandas data frames are one of the most commonly used Python data structures  in the area of data science, and analytics practitioners may use them to carry out transformations such as calculating a new column, transforming columns and reshaping data inside Python.
 
 
As mentioned earlier, a built-in SAS.df2sd method exists in Proc Python, which is meant for transferring data frames to target SAS tables. The SAS object has been created for use within a SPRE environment (also known as SAS compute), but can also be used for CAS table targets.  
 
 
Screenshot 2023-11-08 at 23.08.45.png
 
 
When CAS targets are specified, there's a dependency that a CAS session should exist prior to the SAS callback object's execution, something which all analysts may not be aware of.   For this purpose, the custom step offers an alternative to the SAS.sd2df method for CAS targets, making use of the Python swat package to transfer the dataset from Pandas to a CAS table.  Some Python coders may already be familiar with the swat package as a means of running CAS actions from Python, and may choose the custom step for this purpose.
 
 

Garbage Collection

 

After a handoff transferring data from Python to SAS, the pandas data frame continues to reside in memory and in the namespace of the Python environment.  A reasonable expectation is that in most cases the data goes through further transformations in SAS anyways, and so the pandas data frame may no longer be needed.  However, in Python, deleting the data frame only removes the link between the object's name (the data frame in this case) and the data it points to.  To actually free up memory, a process called garbage collection is required, which operates under some constraints and has to be explicitly coded by the developer.  To make this convenient, this custom step provides an option to remove the data frame and perform garbage collection after the transfer is executed.  This helps keep memory lean.
 
 
Screenshot 2023-11-08 at 23.21.22.png
 
 

The "Quick Promote"

 
Some flows may perform the entire analytics process in Python entirely, and use a CAS table only as a final destination for visualisation purposes, for example in SAS Visual Analytics (VA).  For such purposes, this custom step also provides an option where users can promote the transferred table to global scope in CAS.  This makes the table accessible from Visual Analytics where it can be used within a report.
 
 
Screenshot 2023-11-08 at 23.37.58.png
 

Other Python Objects

 

A bit of trivia : did you know that what's commonly referred to as a variable is called either an object or reference in Python?  There are some discussions on the net around this (and interestingly, some are related to the constraints around garbage collection mentioned earlier).   In any event, this custom step enables you to transfer other objects (i.e. not data frames) to corresponding SAS objects.  These other common object types are:
 
  1. Pandas dataframes can be transferred to either CAS tables or SAS datasets - we've already covered this.
  2. Standard Python objects (int, str etc.) can be transferred to SAS macro variables
  3. Lists (array-like data structures) can be transferred to  CAS tables or SAS datasets (with a user-specified name serving as the column name for the list)
  4. Python Dict objects which resemble JSON, can be transferred to CAS tables or SAS datasets, using pandas data frames as an intermediary medium.

 

All the above options are available in the step; feel free to play with them.
 
 
Screenshot 2023-11-08 at 23.36.26.png
 
Have fun with the "Load Objects to SAS" custom step and I hope it helps you with your open source integration initiatives.  Feel free to get in touch with any thoughts or questions.
Version history
Last update:
‎11-09-2023 12:09 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags