BookmarkSubscribeRSS Feed

Moving your analytics and data to the cloud with SAS Viya and Snowflake: first things first

Started ‎10-23-2023 by
Modified ‎10-26-2023 by
Views 738

Many organizations are now moving analytics and data to the cloud. This has many benefits, including scalability, cost-efficiency, and accessibility. However, there are also some practical questions that admins need to consider. This blog series looks at some of those questions.  It is based on the experience of helping a global SAS customer to carry out this process. This first blog looks at what you need to consider before you start, and how you might make those decisions, up to the start of the operationalization process.

 

Before you start: decisions, decisions...

The first aspect to consider is the need to provide access to both a modern analytics ecosystem, and a modern data fabric. This is needed to make data available for your analytics ecosystem in the cloud. We used Snowflake, which provides a cloud-based data warehouse, and is often used as a modern data platform. SAS Viya as a solid base for a modern analytics ecosystem works well with other cloud storage services as well, which is ideal, because data access needs to be provided through a set of storage services optimized for the customer.

 

The second aspect to consider is the data that need to be migrated. Essentially, you must understand the business needs, including compliance requirements, volume, variety, velocity, and sensitivity of the data. This affects how you will transfer the data in practical terms. You also need to consider any specific requirements. In our case, there was a requirement to have data available for consumption beyond the SAS analytics platform and making data available in Snowflake from SAS Viya supports that well.

 

A question of latency

Data has gravity. It is therefore important to consider latency when transferring data between data centers. In our case, the customer’s current environment is in Europe, and the Snowflake data warehouse is served by Azure in the eastern US. The connection between the on-prem environment to Snowflake in the US is via a VPN connection, which causes high latency, and is therefore less suitable for transferring large volumes of data. The new SAS Viya environment is served by SAS Cloud on an Azure data center in Europe.

 

It therefore made sense to transfer the data to SAS Cloud first, then between Azure data centers to Snowflake. This order of transfer could leverage the Azure backbone network, and therefore provide much lower latency. This also means that data needed on the SAS Viya environment can be made available directly there, where they originate, reducing the volume of data transfer needed.

 

Consider alternatives

It is also worth considering alternative approaches to use to evaluate them against each other. Generally, you want to evaluate each one on speed, cost, and general value as part of the process of moving to the cloud. In this case, we compared the approach of reusing existing assets in Data Integration Studio, rather than rewriting work to another platform in the cloud. We concluded that this approach would be:

  • Faster - Shorter time to value because less work is required than rewriting. This approach only requires small changes to where data is written and read from to optimize it for the cloud. The changes are based on existing and proven data pipelines.
  • Cheaper – reusing existing and proven data pipelines typically requires 5–10 times less work, which reduces the cost. We can also automate testing and delivery through a continuous improvement, continuous development (CICD) pipeline, which is also cheaper than manual testing and delivery.
  • Better – writing analytical data to Snowflake improves accessibility of analytical data for consumption outside SAS. Risk and complexity are reduced when working from proven data pipelines in a productive environment. We would also be reducing the amount of work needed, and therefore freeing up resources.

 

Support business preferences

You also need to know about any business preferences that you will need to accommodate. For example, in this job, the customer had some data integration jobs that needed to be migrated and amended to write into slightly different places. The business preference was to keep the jobs as they were until after migration, rather than amend first. This allowed them to decide later exactly how to use the jobs and maintained them in their existing form. It also made the migration easier, because we were able to simply export them as a package.

 

The key issue was to transfer workflows exactly as they were. You can then work on them in the new environment to ensure that they are fit for purpose there. We used the function in SAS Viya that allows you to import a SAS Package directly. You simply need to decide which folder to import the package into. It is helpful to reestablish the SAS Library with the same name as in the source environment because this saves mapping work. You can then amend the flows to write to Snowflake instead of SAS Datasets.

 

Test and validate

Before any refactoring, it is good practice to write test code to validate whether your changes are implemented as expected. Within SAS, you can use the powerful PROC COMPARE functionality to compare what is being produced in Snowflake with the results from the source environment. You can also compare aggregated tables, including building in a level of tolerance for the different environments. We developed test code to run the tests in our CICD pipeline as well as to validate the changes to the flows. We reused test data and expected results from the source environment to have a baseline to work with for efficient validation.

 

Once you are confident that all the imported flows work as expected in the new environment, you can start the operationalization process. This is covered in the next part of the blog series, on Moving to Viya and Snowflake in the cloud using CICD and git.

 

Version history
Last update:
‎10-26-2023 03:27 AM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags