BookmarkSubscribeRSS Feed

R Runner: get from SAS to R through a Python tunnel

Started ‎08-27-2023 by
Modified ‎09-07-2023 by
Views 1,657
Beyond a certain point, programming language shouldn't matter.
 
Analytics developers appreciate unified platforms which accommodate different programming environments and languages, be they SAS, Python, R or any other.   Access to multiple programming languages also requires that we consider seamless interoperability.  
 
SAS Studio, an application within SAS Viya, offers powerful data engineering and analytics through low-code and programming components.  SAS Studio already provides easy interface to Python through a Python editor and the Proc Python procedure.  However, at present, there doesn't exist similar easy access to R.
 
This article describes R Runner, a SAS Studio Custom Step, which helps you program in R from SAS Studio on SAS Viya.  
 
Teams who code in both SAS and R can now develop more integrated analytics. This especially benefits certain industries where we've noticed a lot of interest in using SAS and R together, such as Pharma, Healthcare & Life Sciences, pockets of the insurance industry, and the public sector.
 
Watch this video for a quick description of what you can do with R Runner.
 
 
 

Access R Runner

 
SAS Studio Custom Steps are low-code components which abstract complex programming logic into an easily consumable package, used across programs and sessions in a repeatable manner.
 
Access R Runner from this folder on the SAS Studio Custom Steps GitHub repository.  Also, here's a direct link to the README.  Note and follow instructions within to import the custom step into a SAS Viya environment.  

 

 

Use R Runner

 
R Runner offers a simple, no-nonsense capability: run R programs from a  SAS Studio session.  This can be done standalone, or as part of a larger process, typically designed as a SAS Studio Flow.  Here are some simple building blocks to help illustrate.   

 

Provide input data 

 
Most analytics require input data. Attach input data to R Runner through an input table port.  Note that if you are running the step within a flow, you may not notice this the first time.  Right click on the step and select "Add input port" for the "inputtable" port.  If running the step standalone, select an input table as directed in "Provide an input table".  
 
Screenshot 2023-08-26 at 10.57.28 PM.png
 
Here's what happens.  Given an attached input table, upon execution, proc python converts this table to a Pandas data frame using the SAS callback method under the covers.  Then, a Python package called rpy2 is used to convert this Pandas data frame to an R frame, the type of tabular data structure that R processes.  There's no need for the user to provide code for this conversion.  Once this conversion is complete, user-submitted R code will be taken up for execution.  The data frame uploaded will henceforth be known as "r_input_table" inside the R environment.

 

Run an R snippet

 
Users have two options to submit code to the step for execution.  Sometimes, you may wish to run a short set of commands in R.  These could be simple tasks such as summarising a data frame or creating a frequency table.  Users can provide such short snippets / commands inside the text area on the custom step.  The text area is limited to contain a maximum of 32768 characters. Users are encouraged to attach an R program for longer programs.    
 
Screenshot 2023-08-26 at 11.01.07 PM.png
 
Here's what happens. R commands submitted inside the text area get written to a temporary file which is then passed on to the r object within rpy2.robjects.  Refer here for an example provided in the rpy2 documentation which shows the code executed behind the scenes.  The benefit to the user is that similar code is baked into this custom step and therefore R programs are passed over to the R interpreter seamlessly using the rpy2 package.  
 
In the interest of full transparency, note the text area is a component of the SAS Studio Custom Steps framework and therefore should not be considered as an editor capable of interpreting R code.  In short, do not expect features like syntax check, automatic indenting or any of the other magic you may encounter in editors like Visual Studio Code or RStudio, among others.  

 

Refer an R program

 
Not all R programs are short enough to be submitted directly to the text area described above.  For longer, more involved code that you wish to lift and shift to SAS Viya,  simply attach your R code to the step for execution.  This makes it convenient to quickly use existing R codebase and minimise scope for changes.
 
Screenshot 2023-08-26 at 11.04.03 PM.png  
 
Here's what happens.  When you provide an R file reference, R Runner checks the location of the file, and then refers it directly within the r object of rpy2.robjects.  This ensures direct access to the code as-is without any intermediate processing.  
 
The second advantage of referring an R program: other conveniences offered by SAS Studio such as Git integration allows users to refer R programs located in a local folder linked to a Git repository.  Upstream changes in the R codebase, once managed with Git, could seamlessly sync a local folder and your process (which may use this custom step with other SAS Studio objects) automatically picks up the same!   Users are advised to refer to R programs located in the filesystem (disk storage attached to the Viya environment), and not files located in SAS Content (the Infrastructure Data Server).  The custom step has been built to work with filesystem content so that program artefacts can be easily transferred (as well as take advantage of Git).  

 

Export output data

 
Finally, once your R program has finished, it's time to access any output data generated.  The output data in R could be in the form of an R variable or an R data frame (unless the program explicitly writes results out to a file).  Users might like to use this output data in downstream processes, which may involve other SAS programs (one of the advantages of running R under a unified platform).  This custom step makes it easy to export output data frames from the R process to a SAS dataset for downstream processing. Users can specify the name of the desired output data frame and provide an output dataset to hold the resultant data.
 
Screenshot 2023-08-26 at 11.05.43 PM.png
 
Here's what happens.  This part of the process is in some ways the reverse of what occurs when ingesting input data. First, the specified R data frame is transferred to a Pandas data frame using the rpy2 package.  This Pandas data frame gets converted to the desired SAS dataset using proc python and the SAS callback method.  Once the data is output, users are free to interact with it any way they like (using SAS or Python programs).  At the same time, if they wish to continue working on the R data frame, they can simply dispense with creating an output dataset and instead call the same R global environment variable (i.e. the data frame name) in a subsequent R Runner step.  The same R session is maintained as long as a valid Python session (with the rpy2 object) is in place.

 

An eye to the future

 
We've released the first version of R Runner and focussed on delivering the essentials - i.e.  ensure R code can be executed, data can be sent to R, and data output from R, in the service of an overall larger integrated process in SAS Studio.  We are actively considering and developing some exciting future improvements, including an easy outlet for graphics (plotting images and charts), better redirection of output wherever feasible and more transparent logging.  We welcome your suggestions for enhancement and improvements.  Please drop us an email by clicking here.  Have fun with R Runner.
Comments

Just as a FYI: The R Runner custom step is now available in the sas-studio-custom-steps repository on GitHub. Here is a direct link to the folder containing the R Runner step .

Thanks, @Wilbram-SAS , links on the article have been updated.

Version history
Last update:
‎09-07-2023 08:27 AM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started