BookmarkSubscribeRSS Feed

Switch on, switch off: run-time control of SAS Studio Custom Steps

Started ‎07-21-2023 by
Modified ‎08-21-2023 by
Views 1,029
Ever played with model trains as a kid?  I have (with a neighbour's set, leading to quite a few tracks broken as a result, but that's another story), and in some of the more expensive sets, we can use switches to move the train along a different track.  Fascinating.

Sigh.  Owing to the many broken parts mentioned earlier, I never got to operate railway switches first-hand.  Maybe that motivated this similar approach which operates on SAS Studio Custom Steps, instead.

SAS Studio Custom Steps are useful low-code components which help you encapsulate and execute SAS and Python programs, either standalone or within a SAS Studio Flow.   A SAS Studio Flow follows a linear process but does not provide developers much flexibility in orchestrating conditional logic, unless this logic is embedded within individual programs.

While developing a flow, I faced a scenario where I needed to run a particular step as a matter of routine, but not when I faced a special condition (based on my input data characteristics).  Indeed, I might have carried out redundant (and time-consuming and ineffective) work if I had continued to run the step, which could prove costly in large data scenarios.

Therefore, I devised a mechanism embedded into my custom step to dynamically control the execution (or non-execution) of a custom step.  This is carried out through a trigger variable which can be changed during run-time.  Let's look at a simple example and also understand how replicable this approach is.
 
Sundaresh1_0-1692628331292.gif

 

 


 

A simple example

 

Let's keep this relatable through a very simple example.  To begin with, follow along by importing this SAS Studio Flow, which uses two dummy datasets.  The GitHub repository contains instructions to import this artefact.

sasstudioflow.png

Let's suppose you've received a dataset for analysis, over which you don't have much control (most publicly available data falls under this realm).  You desire that this dataset conform to certain standards in order to ensure high quality analysis, and included among those standards is the presence of a unique identifier.

Now, it's highly probable that your dataset already contains a column that might purport to be a unique identifier (say, a complaint ID), but it's tough to take this at face value.  For example, what if some complaints had follow-up interactions manifested as a separate record (with the same ID)?  You might consider taking the easy way out and creating a new column with a new unique ID, but that's additional processing which you may like to avoid as much as possible.

dummydatasets.png

In our example flow, we first create two dummy datasets. One of these have a non-unique complaint ID, and the other contains all unique complaint IDs.  My objective is to first validate if these IDs are indeed unique, and, only if they aren't, I'll go ahead and create a new unique ID variable.

twoscenarios.png

Validating if an identifier is indeed unique, for a SAS Cloud Analytics Services (CAS) table, is now possible thanks to the CAS - Validate Unique ID custom step, available through this link.  Similarly, generating a unique ID is possible through the CAS - Generate Unique ID custom step.  It is this step that I have now enhanced through a macro variable, which I call a trigger variable.  Take a look at the tab which explains this variable, which is self-explanatory.

 

textdescription.png

 

 

Now, let's consider one of the swimlanes in our SAS Studio Flow.  A swimlane executes linked nodes (SAS programs, Python programs and steps, including custom steps) from left to right, then top to bottom.  In the normal scheme of operations, the "Generate Unique ID" in the bottom portion of the lane would have executed irrespective of whether there were all unique IDs or not.  With the run-time control, however, I carry out a check after validating the unique ID, and change the trigger variable to 0, therefore "disabling" the running of the custom step.  It still exists in the flow, but the main execution code within the custom step has been dynamically set to not execute.  

 

 

Swimlane.png

 

 

The results can be viewed in the log when we use an input table which has all unique IDs.  In such a case, the "Generate Unique ID" step was not required to execute, and it did not. A message in the log indicates such.

 

 

log.png

 

 

In summary


We've successfully demonstrated how it's possible to control the execution of a custom step dynamically within a flow during run-time.  This type of conditional processing can be extended to other SAS Studio Custom Steps, based on need.  Is it really required for all Custom Steps, though?  The answer depends on your business problem.  If you are frequently in situations where the code execution context does not require  a step to run, then it's worth adding this run-time component.

Have fun trying out the example, and feel free to email in case of any questions.

 

Version history
Last update:
‎08-21-2023 10:32 AM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags