Moving your data and analytics to the cloud with SAS Viya and Snowflake: operationalising

1 Like

This is the second part of the blog series on moving your data and analytics to the cloud with SAS Viya and Snowflake, based on the experience of supporting a global SAS customer. This blog describes the process of operationalizing data pipelines using a continuous improvement, continuous deployment (CICD) pipeline once you have completed testing in the development environment. At which point, you are ready to have the changes pushed to git so that a CICD pipeline can take it from there.

Moving into git

Fortunately for us, SAS Studio has a user-friendly interface to git. It is therefore easy to commit the changes to git and push them into our remote repository managed by Azure DevOps. Azure DevOps will then start a CICD pipeline that is configured using the repository. In our case, the pipeline was triggered by the push of changes into the repository, but pipelines can also be triggered by time and other events.

A CICD pipeline typically has three stages: Build, Test and Deliver. Each stage has a set of jobs to fulfil those stages. Some pipelines may also automate right through to deployment of artifacts to make them fully operational. In our case, that final step was left for those in charge of deployment. This is often the case in regulated industries or where there are other reasons to have those steps done manually. The three stages of deployment all have important roles:

Build – We build what we need to be able to test and later deliver what is needed. This stage also includes preparations required to run a trustworthy test at the next stage. This includes cleaning out previous results so the validation will be based on the results created in the test run.
Test – We run the same tests as we did when we developed our flow, but this time in our test environment. This is configured like the production environment, so that we can be confident that we will get similar results in production.
Deliver – We do some preparations in the production environment before delivering the now tested artifacts all the way to production.

Requirements to support a CICD pipeline

Three environments are required to enable a CICD pipeline (development, test and production), coupled with Azure DevOps and an Azure DevOps agent.

In our case, the Azure Devops Agent was a self-hosted agent running on Linux that we registered to this Azure Devops project (there is a good primer on how to do this here). We followed this guideline and placed our self-hosted agent in the default agent pool. The self-hosted agent can run on any Linux machine (virtual or physical) that has network connectivity to Azure Devops and the environments it interacts with. In our case, this was the test and production environments.

We equipped the agent with our SAS-Viya CLI (link will require that you identify yourself to sas.com - anyone can register), so it could communicate with the SAS Viya environment through scripting commands to be executed on the agent. We also added Python Viya Tools from GitHub to further facilitate communication with the SAS Viya test and production environments and enable the agent to run the required stages.

The Azure Devops Pipelines was a business preference, but there are other pipeline options (there is a great video here that shows the key concepts of an Azure Devops Pipeline). Azure Devops provides several ways of structuring the pipeline, incorporating parameters, variables and predefined tasks that provide access to Azure-based resources. We used a few tasks that were proprietary to Azure, but mostly the pipelines were built around bash scripts that will run on most Linux-based distributions. Using bash scripts also provides a practical way of testing and developing these scripts in the pipeline.

To create a pipeline in Azure Devops, enter the Pipelines section of your Azure Devops project and hit the “New pipeline” button. This will lead you through a wizard that connects the pipeline to the git repository and define the initial pipeline. The pipeline can be edited directly inside Azure Devops as well as in text-based editors. The pipeline is managed in git along with other artifacts in the repository. There is more about how to specify jobs in your pipeline here. However, just as an example, we had jobs in the Build stage to prepare the test environment, clean the results from the Snowflake data warehouse, load the flows from the local repository to the test environment, and generate the SAS code for the flows to be ready for execution.

Automating, accelerating, and migrating

It is always important to consider the practical side of moving data and analytics to the cloud. This blog series has discussed the key lessons from one migration, but generalized them more widely. The key lesson for me throughout was to focus on the business expectations and requirements, and then look for the fastest, most efficient way to achieve the changes required. This underpinned most of the operational decisions made during the migration process, whether that was the order in which to move data, or minor issues about naming libraries. We also took full advantage of the tools within SAS Viya, such as the PROC COMPARE function.

I think this provides a useful way to migrate data and analytics to the cloud, taking advantage of the efficient integration between SAS Viya and Snowflake. Automating the process through a CICD pipeline was perhaps the icing on the cake, but I can certainly recommend this addition.