BookmarkSubscribeRSS Feed
larsarne
SAS Employee

When migrating your SAS 9.4 content to SAS Viya in the cloud – you want to work with data that is optimized for the cloud - how about using Snowflake? And make sure that the data falls right into and from Snowflake(winter pun for the season 🌨😀 ) Snowflake works well in the cloud – and only works in the cloud. And while migrating what you made in SAS 9.4 to SAS Viya in the cloud it is a simple matter to make data available into Snowflake. Through our close partnership with Snowflake, SAS has an optimized access engine with Snowflake that ensures smooth access taking advantage of pushdown SQL so that data movement can be minimized. We find that many customers of SAS use Snowflake in order to make data processed by SAS available for consumption outside of the SAS environment.

 

When doing that, you want to make sure that what you get in Snowflake is right – it may very well be that when migrating to a different platform we may get something different from what we expect.

 

In our case, we want to migrate this DI Studio job as shown below:

larsarne_1-1702301902942.png

 

To this SAS Studio flow in SAS Viya:

larsarne_0-1702314128473.png

The migration as such was done fully automatic – but the destination tables were changed to a Snowflake based library so that we would make the aggregated data available in Snowflake for others to consume rather than SAS Datasets.

 

A very efficient way of making sure that we get what we expect in Snowflake is by comparing what we get in Snowflake with what was produced in the SAS DI Studio job as SAS datasets. It so happens that SAS comes with a PROC COMPARE function that does exactly that.

 

Still – there are a few things to be aware of when comparing SAS datasets to a relational datawarehouse like Snowflake:

  • Data in relational data warehouses like Snowflake are unsorted – so for PROC COMPARE to work right you would need to sort the data
  • And while you are at it, you may want to sort the SAS dataset you are comparing with even though SAS Datasets do have a default sort order. If you know the default sort order of the dataset – you can just sort the Snowflake data in the same sort order and compare directly
  • Numbers in Snowflake are represented differently from SAS, so they may end up with some differences if they are a result of aggregation – so when comparing what kind of tolerance would you allow to accept the comparison?
  • PROC COMPARE simply compare all data row by row – so you may want to control the number of rows you compare to ensure it complete in a reasonable amount of time. Use PROC COMPARE with a WHERE clause if you need to filter the data for instance.

 

Use of PROC COMPARE to compare the Snowflake table produced by the migrated SAS Studio flows to the expected SAS datasets produced by the original SAS DI Studio jobs will look like this:

proc sort data=SNOWFL.Toy_Summary out=Toy_Summary;
by Date ProductBrand ProductLine Product;
run;

proc sort data=EXPECTED.Toy_Summary out=exp_Toy_Summary;
    by Date ProductBrand ProductLine Product;
    run;
    
proc compare 
    base=exp_Toy_Summary 
    compare=Toy_Summary
    METHOD = absolute 
    criterion = 0.01
    briefsummary;
    id Date ProductBrand ProductLine Product;
run;

proc sort data=SNOWFL.Novelty_Summary out=Novelty_Summary;
by Date ProductBrand ProductLine Product;
run;

proc sort data=EXPECTED.Novelty_Summary out=exp_Novelty_Summary ;
by Date ProductBrand ProductLine Product;
run;

proc compare 
    base=exp_Novelty_Summary 
    compare=Novelty_Summary    
    METHOD = absolute 
    criterion = 0.01
    briefsummary;
    id Date ProductBrand ProductLine Product;
run;  

 

Here we are using the criterion of 0.01 to allow for a 2 decimal point difference when comparing numbers. This can make sense if we are considering these numbers used for accounting purposes.

 

Such a test can be used as part of automating the test of the SAS Studio flow we just migrated and by that be used in a CICD pipeline as well.

If you found this juletip useful, you may enjoy these blogs as well on how to migrate your SAS 9.4 Enterprise Guide projects and SAS 9.4 DI jobs to SAS Viya Studio flows:

 

And by this approach we have validated that our Snowflake data falls 🌨 nicely and can be used for analytical consumption also outside of SAS.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Discussion stats
  • 0 replies
  • 514 views
  • 1 like
  • 1 in conversation