In the last few weeks, I’ve been working a lot on the integration between SAS Viya and Teradata. In this article, I want to highlight some of the steps required to make SAS Viya and Teradata run together in the cloud.
SAS and Teradata have been strategic partners for more than 14 years, and this partnership is still active and dynamic with most of the SAS-Teradata joint capabilities being already available in SAS Viya 2021.x.
Before addressing and illustrating SAS-Teradata features in future posts, let’s review high-level principles about a SAS-Teradata setup on Azure.
There has been a lot of enablement materials regarding the deployment of SAS Viya on cloud providers in general and Microsoft Azure in particular. This is a wide and tough topic. While the purpose of this article is not to describe how to deploy SAS Viya on Azure, I suggest you to start with Raphaël Poumarede's article that summarizes the process and options pretty well.
So, we’ll assume here that we have successfully deployed SAS Viya on Azure.
Yes, you can deploy Teradata on the cloud!
Teradata has built its reputation over the last 40 years by providing turnkey massive data warehousing appliances on premises, perfect blends of processors, memory, storage disks, an operating system and software wired together. More or less like Apple MacBooks or iPhones, hardware and software are indissociable.
Then Teradata has successfully transitioned in the last few years to a “multi-cloud” and “software” company, providing its data warehousing capabilities as software on the different public cloud providers.
Teradata’s current offering is named Teradata Vantage. Teradata Vantage is a modern data warehousing platform built on a core relational database.
SAS and Teradata joint capabilities are based on that core relational database.
Teradata’s flagship product in Azure is Teradata Vantage As-a-Service (AaS). With Teradata Vantage AaS in Azure, Teradata manages system installation and administration tasks for the customer, which includes the install and upgrades of rpm packages like the SAS Embedded Process (discussed later).
Another option has raised a particular interest to me in the Microsoft Azure Marketplace: Teradata Vantage (DIY).
This Do-It-Yourself offering is a very flexible and easy way to deploy and customize Teradata Vantage on your own without any help. It allows a Cloud Architect to deploy Teradata in minutes and choose many options like:
The beauty of it: 45 minutes to 1 hour later, you’ll get your Teradata Vantage instance on Azure ready to use.
It is also possible to deploy Teradata using a script and Azure Resource Manager.
With Teradata Vantage (DIY) in Azure, the customer/user manages all system creation, upgrades and administration tasks. Therefore, the customer/user is responsible for installing the SAS Embedded Process (discussed later) on Vantage DIY in Azure.
You will also need to initialize your Teradata environment: create additional databases, users, load some data files, etc.
Now, we should have 2 Azure Resource Groups:
In order for SAS Viya and Teradata Vantage to seamlessly connect with each other, we need to define a Virtual Network Peering between SAS Viya’s “vnet” and Teradata’s one. When peered, resources in either virtual network can directly connect with resources in the peered virtual network.
Peering can be defined locally (same Azure region) or globally (across Azure regions) and need to be defined both ways (SAS <> Teradata).
They can be setup in Azure’s Portal:
Or using the Azure CLI:
az network vnet peering create -g ${MYUSER}-azuredm-rg \
-n SAS2Teradata \
--vnet-name ${MYUSER}-azuredm-vnet \
--remote-vnet ${TDVNETID} \
--allow-vnet-access \
--allow-forwarded-traffic
az network vnet peering create -g ${MYUSER}-azuredm-teradata \
-n Teradata2SAS \
--vnet-name vnet-teradata \
--remote-vnet ${SASVNETID} \
--allow-vnet-access \
--allow-forwarded-traffic
For SAS to interact with Teradata, a Teradata client is required. As usual. SAS Viya being deployed now in Kubernetes makes this topic a little bit more complex than before.
Indeed, you need to make the client available to all the SAS Viya pieces that need it (SAS Compute Server + all CAS nodes). In Kubernetes, that means you will install the client on a shared persistent storage which could be NFS for example. See Raphaël Poumarede's article for more details.
The Teradata client that is needed is “Teradata Tools and Utilities” or TTU and is available to download.
In my setup, here is what I did:
The Teradata client is now available from all the pods requiring it.
Additional configuration is needed in SAS Viya to take that Teradata client into account and to specify the right client encoding. This is documented in the “SAS Viya Readme” (accessible in “My SAS” portal or in the deployment directory):
sas-access.properties:
##########################
# SAS/ACCESS to Teradata
##########################
COPLIB=/azuredm/access-clients/teradata
TERADATA=/azuredm/access-clients/teradata/client/17.10/lib64
clispb.dat:
charset_type=N
charset_id=UTF8
There you go. You should be ready to connect to Teradata from SAS. The last thing you need is the private IP address of one of the Teradata nodes.
You can get it from the Azure Portal:
Or you can get it with the Azure CLI:
az network nic ip-config list -g ${MYUSER}-azuredm-teradata --nic-name database-nic00 \
--query "[?name=='ip1'].{privateIpAddress:privateIpAddress}" -o tsv
The code to test the access to Teradata from SAS or CAS will be similar to this:
%let TD_PRIVATE_IP=10.1.1.4 ;
/* Assign Teradata Library */
libname sastera teradata server="&TD_PRIVATE_IP" database="gelindb"
schema="gelindb" user="xxx" password="xxx" bulkload=yes dbcommit=0 ;
/* List library contents */
proc datasets lib=sastera ;
quit ;
/* Start a CAS session */
cas mysession ;
caslib tera datasource=(srctype="teradata" server="&TD_PRIVATE_IP" database="gelindb"
schema="gelindb" username="xxx" password="xxx") libref=castera ;
/* List files in Teradata */
proc casutil ;
list files ;
quit ;
cas mysession terminate ;
We have been able so far to integrate SAS Viya and Teradata on Azure. That’s one step to unlock some basic capabilities between SAS and Teradata (data access, optimized data movement, in-database processing, bulk loading/unloading, etc.).
But the SAS – Teradata integration is not limited to that. It offers much more. Additional advanced features are enabled with the deployment of the SAS Embedded Process, a lightweight SAS engine, on the Teradata nodes.
Here is what you need to do to deploy SAS Embedded Process and its supporting functions in Teradata and to leverage all SAS and Teradata capabilities:
You will have to ssh to the Teradata nodes in order to perform these actions. Thus, a public IP address might have to be configured on the Azure/Teradata side, as well as some firewall rules.
To fully leverage a massively parallel multi-machine (CAS nodes in SAS Viya) to multi-machine (Teradata nodes) communication, don’t forget to enable “Data Connector Ports” in the SAS Viya deployment. This is documented in the “SAS Viya Readme” (accessible in “My SAS” portal or in the deployment directory).
With the SAS Embedded Process and its supporting functions correctly deployed, SAS users will now be able to:
We'll talk about those nice features in future articles.
Thanks for reading.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.