SAS Studio Custom Steps provide a low-code user interface to SAS programs, promoting code reusability and accelerating analytics development. As a contributor to the SAS Studio Custom Steps GitHub repository (and a maintainer of other similar custom step repos), which continues to grow, I have found myself increasingly preoccupied with the following thoughts:
Multiple interfaces benefit from same code: From a SAS perspective, this appeals to me because many programmers execute SAS programs without custom steps, some prefer alternatives to SAS Studio such as Visual Studio Code (for which a SAS extension for Visual Studio Code is available). Also, during the past year, SAS launched a developer-centric offering, SAS Viya Workbench (refer some demos here) which does not require access to an enterprise SAS Viya environment. Leaving SAS-specific considerations aside, I still find the idea appealing due to potential ease of maintenance.
Robust code requires rigorous testing: Most custom steps require numerous input parameters, some combinations of which might result in error. A simple example is when input data requires different execution paths based on whether it is a SAS dataset or a CAS table. Human nature leads custom step designers to confine themselves to ‘happy paths’ and limited edge cases (also remember, most custom steps are created by analytics practitioners looking for quick solutions). As the number of test scenarios increase, it becomes labour-intensive to design and execute the same. In effect, I wait for my ‘customers’ (users) to helpfully raise bugs in future.
These thoughts drove me to devise an approach which eases custom step code testing, which I’d like to share through this article. Interestingly, this approach does not have much to do with the custom step builder in SAS Studio, but is more concerned with the broader programming environment under which code runs. This is because I prefer testing the code standalone first (in Visual Studio Code) and then wiring it to a custom step UI (involving further tests out of scope for this article). This also implies that, if useful, you can extend this approach to testing any SAS program, not just those which power custom steps. (A quick side-note: SAS code also refers to Python script called during the execution of SAS programs.)
About the Environment
The environment is what the environment facilitates. Okay, I allow my practical side latitude over the pedantic in this description. I focus upon just two aspects of the programming environment that help me in my objective, viz.
Autoexecs
The autoexec.sas file is a SAS program that runs every time you start a SAS session, and enables you to specify environment variables (and/or macro variables) relating to different test scenarios that can be picked up by the custom step’s SAS program ( ‘test’ program).
Let us consider this custom step that, incidentally, is under development at the time of writing this article (and will likely be published shortly). Without getting into details, this step helps users interact with an LLM (Large Language Model) passing data contained in a SAS dataset (or a CAS table). As you may notice from the list of variables used in this step (available here), there are quite a number of input parameters asked for. Multiple options provide additional opportunity for error. For example, the user is free to specify either SAS and CAS engines in both input and output tables, in itself necessitating four different tests (and that’s only testing one aspect of the code!)
The SAS program which powers the custom step is available for testing here. Now, let us consider the programming interface. As mentioned, I prefer the SAS extension for Visual Studio Code but, in deference to those who like SAS Studio, here’s how you access the autoexec in a SAS Studio session:
Go to SAS Studio (Develop Code and Flows)
Select Options -> Autoexec file
As depicted below. Interestingly, this was an opportunity for me to sign in as a non-administrator after a long time, just to ensure that steps are relevant for non-admin users (who have lesser privileges than administrators). Thankfully, no surprises.
What if you were in Visual Studio Code? Here, you don’t have the benefit of being inside the 'mother ship'. Rather, imagine yourself as a small rocket trying to communicate with a larger space station (or whatever it’s called, conquering outer space is best left to some others). Here, not only do you have to submit your test program, but also submit code that you want executed prior to the program. Follow instructions provided here.
Autoexec Contents
The next question could be: but, what do I enter in the autoexec code? Specify values for your required test input parameters either as global macro variables (recognised within SAS programs) or as environment variables, through the options set command. While I have used 'options set' throughout for reasons of uniformity (these are environment variables set in the underlying operating system), your mileage may differ and you might need to consider this decision carefully. Be cognizant of the differences between environment variables and macro variables, and their respective advantages and limitations.
Using the earlier example, my list of parameters works out to something like the below. Of course, I’m being careful and don’t expose all details (i.e. sensitive details). I’m sure you’ll let me know if so (seriously, please do let me know through comments; I need to address and improve).
Remember, when you use “options set=..”, you are basically dealing with environment variables. Some environment variables can be introduced even at the level of the node which hosts the compute session pod. Therefore, be aware of the variables and options you set and the scope of the same, especially on a shared system.
Compute Contexts
Let’s shift gears a bit and consider compute contexts. Note that compute contexts are controlled by a SAS Administrator, always a reminder that it helps to be in the good books of those people (by the way, elevated privileges on a Viya environment gives you a SAS Administrator role). Compute contexts are documented here. We don’t need to get into the details, but, suffice it to say that compute contexts provide a medium to specify information that’s used when running a workload (program) in a SAS Compute server session. This information includes autoexec settings, which is where contexts prove useful.
For a majority of cases, it may be enough to use autoexec.sas directly to change test settings. However, as experienced frequently, a change made to a program at later stages of development may harm tests that passed earlier, a classic regression testing indicator. In such a case, instead of manually changing autoexec contents (or swapping out autoexec.sas files), you might find it easier to just switch contexts, each of which gets wired to a set of test parameters. Therefore, as you engage in development of a custom step, you might find it useful to work with your administrator beforehand and set up contexts for testing as shown below.
For Visual Studio Code, there’s an added bonus:not only can you use multiple contexts (that your administrator helped create), you can also define multiple profiles that are associated with autoexec.sas files containing different parameters. This also implies you can reuse a standard context such as SAS Job Execution Compute Context, and specify different autoexec.sas files per VS Code profile, reducing dependency on the admin for multiple contexts.
To define multiple profiles, press Ctrl (or Cmd, in case of Mac) + Shift + P in your Visual Studio Code window, type SAS: Add New Connection Profile, and provide required parameters. As shown below. Note however that autoexecs execute upon the start of a SAS session, not by just merely switching between profiles on Visual Studio Code.
Wiring your Program for Tests
Given a SAS program under development, how do you ensure it receives updated test parameters without repeated changes? For this purpose, I make it a point to add a commented “debug section” (call it a test section if you like) in my code. The debug section, when uncommented, simply takes its values from system options or global macro variables defined upstream.
The variables defined in the section below are from the earlier example, and obviously change based on your program’s objective.
/*-----------------------------------------------------------------------------------------*
DEBUG Section
Code under the debug section SHOULD ALWAYS remain commented unless you are tinkering with
or testing the step!
*------------------------------------------------------------------------------------------*/
/* Provide test values for parameters */
data _null_;
call symput('inputData',"%sysget(inputData)");
call symput('systemPrompt', "%sysget(systemPrompt)");
call symput('userPrompt', "%sysget(userPrompt)");
call symput('userExample', "%sysget(userExample)");
call symput('docId', "%sysget(docId)");
call symput('textCol', "%sysget(textCol)");
call symput('azureKeyLocation', "%sysget(azureKeyLocation)");
call symput('azureOpenAIEndpoint', "%sysget(azureOpenAIEndpoint)");
call symput('azureRegion', "%sysget(azureRegion)");
call symput('openAIVersion', "%sysget(openAIVersion)");
call symput('outputTable', "%sysget(outputTable)");
call symput('genModelDeployment', "%sysget(genModelDeployment)");
call symputx('temperature', %sysget(temperature));
run;
What happens during a production run, i.e. once the Custom Step has been published? The debug section is commented and does not run. Only upon an error which demands trial runs and iterations, does the step consumer need to uncomment this section and try changing parameters to see what happens. This way, a test tool (during development) becomes a debug tool (during production).
The other advantage is that the user needn’t confine themselves to autoexecs and variables defined during contexts, but are free to modify the input parameters ad-hoc, as long as it helps them in resolving their error.
Custom step creators can also take advantage of this mechanism to include some tests or examples as a way to demonstrate the SAS program behind a published step.
In summary
Autoexecs, compute contexts, and features in third-party editors such as Visual Studio Code enable Custom Step creators to test SAS programs more rigorously prior to hooking them to UI (or even run standalone). Easy testing experience also drives readiness to design and execute more test scenarios, thus ensuring rigour in the process.
This improves quality of output and first-pass yield, reducing future bug possibilities. Also, creators can decouple code generation from UI and work on each component in a modular and focussed manner. Such focus can yield a SAS program that is suitable or easily modifiable for other targets (SAS Viya Workbench, Visual Studio Code extension and SAS Job Execution, among others).
As a practice, I follow a folder structure which helps me implement this framework effectively. I save my SAS program and my UI components separately and only combine them when it’s time to build the final Custom Step. In GitHub repos, as this recent example shows, the program resides in an /extras folder, explained here.
Progress on this front has also motivated other tools to help semi-automate creation of the UI, and even build the step, details of which can be shared in a subsequent post.
Feel free to experiment with this approach. It works for me, but perhaps you might like to do something slightly different, even radically different. In any case, please share your views through comments, or you can email me here: Sundaresh.sankaran@sas.com.
... View more