Switch on, switch off: run-time control of SAS Studio Custom Steps

2 Likes

Ever played with model trains as a kid? I have (with a neighbour's set, leading to quite a few tracks broken as a result, but that's another story), and in some of the more expensive sets, we can use switches to move the train along a different track. Fascinating.

Sigh. Owing to the many broken parts mentioned earlier, I never got to operate railway switches first-hand. Maybe that motivated this similar approach which operates on SAS Studio Custom Steps, instead.

SAS Studio Custom Steps are useful low-code components which help you encapsulate and execute SAS and Python programs, either standalone or within a SAS Studio Flow. A SAS Studio Flow follows a linear process but does not provide developers much flexibility in orchestrating conditional logic, unless this logic is embedded within individual programs.

While developing a flow, I faced a scenario where I needed to run a particular step as a matter of routine, but not when I faced a special condition (based on my input data characteristics). Indeed, I might have carried out redundant (and time-consuming and ineffective) work if I had continued to run the step, which could prove costly in large data scenarios.

Therefore, I devised a mechanism embedded into my custom step to dynamically control the execution (or non-execution) of a custom step. This is carried out through a trigger variable which can be changed during run-time. Let's look at a simple example and also understand how replicable this approach is.

A simple example

Let's keep this relatable through a very simple example. To begin with, follow along by importing this SAS Studio Flow, which uses two dummy datasets. The GitHub repository contains instructions to import this artefact.

Let's suppose you've received a dataset for analysis, over which you don't have much control (most publicly available data falls under this realm). You desire that this dataset conform to certain standards in order to ensure high quality analysis, and included among those standards is the presence of a unique identifier.

Now, it's highly probable that your dataset already contains a column that might purport to be a unique identifier (say, a complaint ID), but it's tough to take this at face value. For example, what if some complaints had follow-up interactions manifested as a separate record (with the same ID)? You might consider taking the easy way out and creating a new column with a new unique ID, but that's additional processing which you may like to avoid as much as possible.

In our example flow, we first create two dummy datasets. One of these have a non-unique complaint ID, and the other contains all unique complaint IDs. My objective is to first validate if these IDs are indeed unique, and, only if they aren't, I'll go ahead and create a new unique ID variable.

Validating if an identifier is indeed unique, for a SAS Cloud Analytics Services (CAS) table, is now possible thanks to the CAS - Validate Unique ID custom step, available through this link. Similarly, generating a unique ID is possible through the CAS - Generate Unique ID custom step. It is this step that I have now enhanced through a macro variable, which I call a trigger variable. Take a look at the tab which explains this variable, which is self-explanatory.

Now, let's consider one of the swimlanes in our SAS Studio Flow. A swimlane executes linked nodes (SAS programs, Python programs and steps, including custom steps) from left to right, then top to bottom. In the normal scheme of operations, the "Generate Unique ID" in the bottom portion of the lane would have executed irrespective of whether there were all unique IDs or not. With the run-time control, however, I carry out a check after validating the unique ID, and change the trigger variable to 0, therefore "disabling" the running of the custom step. It still exists in the flow, but the main execution code within the custom step has been dynamically set to not execute.

The results can be viewed in the log when we use an input table which has all unique IDs. In such a case, the "Generate Unique ID" step was not required to execute, and it did not. A message in the log indicates such.

In summary

We've successfully demonstrated how it's possible to control the execution of a custom step dynamically within a flow during run-time. This type of conditional processing can be extended to other SAS Studio Custom Steps, based on need. Is it really required for all Custom Steps, though? The answer depends on your business problem. If you are frequently in situations where the code execution context does not require a step to run, then it's worth adding this run-time component.

Have fun trying out the example, and feel free to email in case of any questions.