Many organizations are now moving analytics and data to the cloud. This has many benefits, including scalability, cost-efficiency, and accessibility. However, there are also some practical questions that admins need to consider. This blog series looks at some of those questions. It is based on the experience of helping a global SAS customer to carry out this process. This first blog looks at what you need to consider before you start, and how you might make those decisions, up to the start of the operationalization process.
The first aspect to consider is the need to provide access to both a modern analytics ecosystem, and a modern data fabric. This is needed to make data available for your analytics ecosystem in the cloud. We used Snowflake, which provides a cloud-based data warehouse, and is often used as a modern data platform. SAS Viya as a solid base for a modern analytics ecosystem works well with other cloud storage services as well, which is ideal, because data access needs to be provided through a set of storage services optimized for the customer.
The second aspect to consider is the data that need to be migrated. Essentially, you must understand the business needs, including compliance requirements, volume, variety, velocity, and sensitivity of the data. This affects how you will transfer the data in practical terms. You also need to consider any specific requirements. In our case, there was a requirement to have data available for consumption beyond the SAS analytics platform and making data available in Snowflake from SAS Viya supports that well.
Data has gravity. It is therefore important to consider latency when transferring data between data centers. In our case, the customer’s current environment is in Europe, and the Snowflake data warehouse is served by Azure in the eastern US. The connection between the on-prem environment to Snowflake in the US is via a VPN connection, which causes high latency, and is therefore less suitable for transferring large volumes of data. The new SAS Viya environment is served by SAS Cloud on an Azure data center in Europe.
It therefore made sense to transfer the data to SAS Cloud first, then between Azure data centers to Snowflake. This order of transfer could leverage the Azure backbone network, and therefore provide much lower latency. This also means that data needed on the SAS Viya environment can be made available directly there, where they originate, reducing the volume of data transfer needed.
It is also worth considering alternative approaches to use to evaluate them against each other. Generally, you want to evaluate each one on speed, cost, and general value as part of the process of moving to the cloud. In this case, we compared the approach of reusing existing assets in Data Integration Studio, rather than rewriting work to another platform in the cloud. We concluded that this approach would be:
You also need to know about any business preferences that you will need to accommodate. For example, in this job, the customer had some data integration jobs that needed to be migrated and amended to write into slightly different places. The business preference was to keep the jobs as they were until after migration, rather than amend first. This allowed them to decide later exactly how to use the jobs and maintained them in their existing form. It also made the migration easier, because we were able to simply export them as a package.
The key issue was to transfer workflows exactly as they were. You can then work on them in the new environment to ensure that they are fit for purpose there. We used the function in SAS Viya that allows you to import a SAS Package directly. You simply need to decide which folder to import the package into. It is helpful to reestablish the SAS Library with the same name as in the source environment because this saves mapping work. You can then amend the flows to write to Snowflake instead of SAS Datasets.
Before any refactoring, it is good practice to write test code to validate whether your changes are implemented as expected. Within SAS, you can use the powerful PROC COMPARE functionality to compare what is being produced in Snowflake with the results from the source environment. You can also compare aggregated tables, including building in a level of tolerance for the different environments. We developed test code to run the tests in our CICD pipeline as well as to validate the changes to the flows. We reused test data and expected results from the source environment to have a baseline to work with for efficient validation.
Once you are confident that all the imported flows work as expected in the new environment, you can start the operationalization process. This is covered in the next part of the blog series, on Moving to Viya and Snowflake in the cloud using CICD and git.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.