SAS Viya with Azure DevOps Data Pipeline

1 Like

Watch the short video demonstration in this post. The video shows a SAS Viya and Azure DevOps pipeline. The pipeline focuses on data management activities in SAS Viya, such as running SAS programs and SAS Studio flows.

This is the second post in the series. The previous post offered an overview of SAS Viya CI/CD Pipelines with Azure DevOps.

Video Demonstration

This example shows an Azure pipeline which combines several data management tasks in SAS Viya:

(view in My Videos)

The Git Repository

The "trick" is to add everything that you need to run in a Git repository. Azure Repos are used in this example.

First, add the Azure Repo in SAS Studio. Second, create and test the SAS programs and SAS Studio flows in SAS Viya. Third, push the files containing the SAS programs and flows to the remote Git repository.

You can export the SAS Studio flow as a SAS package (JSON file) and then store it in the Git repository. You could also save the flow file directly (FLW file), in the same repository.

The Azure Pipeline

Several jobs define the pipeline. Each job performs a task:

Load a CAS table from a SAS program.
Import a SAS Studio flow from a package.
Validate manually the flow.
Run the flow to query the table and create an output table.
Save the output table, as a file, in Azure Data Lake Storage Gen 2 (ADLS2).

The Azure pipeline definition, a YAML file, defines the tasks, their order, where they should run and when.

You can view and change this YAML file directly from SAS Studio.

When Will The Pipeline Run?

The trigger type in the pipeline definition sets when the pipeline will start. For instance, every time you push a change to the main branch of the Git repository.

Where Will the Pipeline Run?

The pipeline runs on a self-hosted agent, a Virtual Machine, which can communicate with the SAS Viya deployment. More about agents in a future post.

What Will the Pipeline Run?

The pipeline uses SAS-VIYA CLI (Command Line Interface) to interact with SAS Viya:

To run the SAS programs: sas-viya batch jobs submit-pgm.

To import the SAS studio flow: sas-viya transfer import.

To run the SAS studio flow: sas-viya job requests execute.

For the flow to run, it needs a job request. The easiest way to generate a job request is to schedule the job request. If you change the SAS Studio Flow, do not forget to re-schedule it, to get a new job request. Work is in progress for a REST API to run flows directly from the SAS code they generate.

Considerations and Open Discussion

What can you achieve with such a pipeline? You can:

Group all the related technical jobs by objective. Adopt a process-oriented approach.
Decouple the jobs. Suppose we have three developers working together to deliver the “data product”, the output file, in this example. Two prefer to code in SAS, one prefers SAS Studio flows. All of them have different working schedules. A pipeline with a Git repository would keep everything together. The same pipeline would give them the flexibility to work independently and incrementally on their programs or flows. Their customer will receive the "data product" earlier.
Test if the whole is still working. The pipeline can run after each incremental change. The pipeline "build" implicitly tests if one of the jobs still works after changes were brought to another.
Put SAS Viya in an enterprise perspective. The tasks in SAS Viya might be dependent on other tasks or scripts operating outside SAS Viya. For example, a bash script might be needed to copy the input files in a mount accessible in SAS Viya. Azure CLI commands might be required to copy the output files somewhere else, assign user permissions and so on. An Azure pipeline can contain tasks operating inside and outside SAS Viya.
Combine different types of jobs. You can mix execute jobs with import jobs and manual review steps.
Sequence and condition the execution of jobs. Start a job only after other jobs have completed successfully or not.

Conclusions

The post demonstrated a SAS Viya and Azure DevOps pipeline. The pipeline ran data management tasks in SAS Viya. All the pipeline code was stored in a Git repository (Azure Repos). The pipeline interacts with SAS Viya through the SAS Viya CLI. The post ended with an open discussion, when and why you might use SAS Viya and Azure Pipelines for data management tasks.

Resources

Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about this topic. If you wish to get more information, please write me an email.

Find more articles from SAS Global Enablement and Learning here.