BookmarkSubscribeRSS Feed

SAS Viya and Teradata Together on Microsoft Azure

Started ‎09-24-2021 by
Modified ‎09-30-2021 by
Views 4,411

In the last few weeks, I’ve been working a lot on the integration between SAS Viya and Teradata. In this article, I want to highlight some of the steps required to make SAS Viya and Teradata run together in the cloud.

 

SAS and Teradata have been strategic partners for more than 14 years, and this partnership is still active and dynamic with most of the SAS-Teradata joint capabilities being already available in SAS Viya 2021.x.

 

Before addressing and illustrating SAS-Teradata features in future posts, let’s review high-level principles about a SAS-Teradata setup on Azure.

Deploy SAS Viya on Azure

There has been a lot of enablement materials regarding the deployment of SAS Viya on cloud providers in general and Microsoft Azure in particular. This is a wide and tough topic. While the purpose of this article is not to describe how to deploy SAS Viya on Azure, I suggest you to start with Raphaël Poumarede's article that summarizes the process and options pretty well.

 

So, we’ll assume here that we have successfully deployed SAS Viya on Azure. Wink

Deploy Teradata on Azure

Yes, you can deploy Teradata on the cloud!

 

Teradata has built its reputation over the last 40 years by providing turnkey massive data warehousing appliances on premises, perfect blends of processors, memory, storage disks, an operating system and software wired together. More or less like Apple MacBooks or iPhones, hardware and software are indissociable.

 

nir_post_67_01_teradata_appliance.jpg

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Then Teradata has successfully transitioned in the last few years to a “multi-cloud” and “software” company, providing its data warehousing capabilities as software on the different public cloud providers.

 

Teradata’s current offering is named Teradata Vantage. Teradata Vantage is a modern data warehousing platform built on a core relational database.

 

SAS and Teradata joint capabilities are based on that core relational database.

How to deploy Teradata Vantage on Azure?

Teradata’s flagship product in Azure is Teradata Vantage As-a-Service (AaS). With Teradata Vantage AaS in Azure, Teradata manages system installation and administration tasks for the customer, which includes the install and upgrades of rpm packages like the SAS Embedded Process (discussed later).

 

Another option has raised a particular interest to me in the Microsoft Azure Marketplace: Teradata Vantage (DIY).

 

nir_post_67_02_teradata_marketplace.png

 

This Do-It-Yourself offering is a very flexible and easy way to deploy and customize Teradata Vantage on your own without any help. It allows a Cloud Architect to deploy Teradata in minutes and choose many options like:

  • the database version
  • the database tier from “Developer” to “Enterprise
  • the number and size of the database nodes
  • the storage associated with your database nodes
  • some database features
  • additional services (Viewpoint, Data Mover, QueryGrid Manager, etc.)

nir_post_67_03_teradata_azure_setup.png

 

The beauty of it: 45 minutes to 1 hour later, you’ll get your Teradata Vantage instance on Azure ready to use.

 

It is also possible to deploy Teradata using a script and Azure Resource Manager.

 

With Teradata Vantage (DIY) in Azure, the customer/user manages all system creation, upgrades and administration tasks. Therefore, the customer/user is responsible for installing the SAS Embedded Process (discussed later) on Vantage DIY in Azure.

 

You will also need to initialize your Teradata environment: create additional databases, users, load some data files, etc.

 

Now, we should have 2 Azure Resource Groups:

  • one for SAS Viya
  • one for Teradata Vantage

Define a Virtual Network Peering

In order for SAS Viya and Teradata Vantage to seamlessly connect with each other, we need to define a Virtual Network Peering between SAS Viya’s “vnet” and Teradata’s one. When peered, resources in either virtual network can directly connect with resources in the peered virtual network.

 

Peering can be defined locally (same Azure region) or globally (across Azure regions) and need to be defined both ways (SAS <> Teradata).

 

They can be setup in Azure’s Portal:

 

nir_post_67_04_azure_peering.png

 

Or using the Azure CLI:

 

az network vnet peering create -g ${MYUSER}-azuredm-rg \
   -n SAS2Teradata \
   --vnet-name ${MYUSER}-azuredm-vnet \
   --remote-vnet ${TDVNETID} \
   --allow-vnet-access \
   --allow-forwarded-traffic

az network vnet peering create -g ${MYUSER}-azuredm-teradata \
  -n Teradata2SAS \
  --vnet-name vnet-teradata \
  --remote-vnet ${SASVNETID} \
  --allow-vnet-access \
  --allow-forwarded-traffic
  

Make the Teradata Client available to SAS Viya

For SAS to interact with Teradata, a Teradata client is required. As usual. SAS Viya being deployed now in Kubernetes makes this topic a little bit more complex than before.

 

Indeed, you need to make the client available to all the SAS Viya pieces that need it (SAS Compute Server + all CAS nodes). In Kubernetes, that means you will install the client on a shared persistent storage which could be NFS for example. See Raphaël Poumarede's article for more details.

 

The Teradata client that is needed is “Teradata Tools and Utilities” or TTU and is available to download.

 

In my setup, here is what I did:

  • I deployed SAS Viya on Azure with an NFS Server VM (standard option) – this automatically makes a Persistent Volume available to all my pods – I will use it to share the Teradata client
  • On a separate, temporary, linux machine, I installed the “Teradata Tools and Utilities” client (I only picked some of the tools required by SAS Viya) in a customized directory – then I created an archive from that customized directory
  • I uploaded the archive to my Azure NFS Server VM and uncompressed it

The Teradata client is now available from all the pods requiring it.

 

Additional configuration is needed in SAS Viya to take that Teradata client into account and to specify the right client encoding. This is documented in the “SAS Viya Readme” (accessible in “My SAS” portal or in the deployment directory):

 

sas-access.properties:

 

##########################
# SAS/ACCESS to Teradata
##########################
COPLIB=/azuredm/access-clients/teradata
TERADATA=/azuredm/access-clients/teradata/client/17.10/lib64

clispb.dat:

charset_type=N
charset_id=UTF8

Validate the Connection

There you go. You should be ready to connect to Teradata from SAS. The last thing you need is the private IP address of one of the Teradata nodes.

 

You can get it from the Azure Portal:

 

nir_post_67_05_teradata_ip_address.png

 

Or you can get it with the Azure CLI:

 

az network nic ip-config list -g ${MYUSER}-azuredm-teradata --nic-name database-nic00 \
  --query "[?name=='ip1'].{privateIpAddress:privateIpAddress}" -o tsv
  

 

The code to test the access to Teradata from SAS or CAS will be similar to this:

 

%let TD_PRIVATE_IP=10.1.1.4 ;

/* Assign Teradata Library */
libname sastera teradata server="&TD_PRIVATE_IP" database="gelindb"
   schema="gelindb" user="xxx" password="xxx" bulkload=yes dbcommit=0 ;

/* List library contents */
proc datasets lib=sastera ;
quit ;

/* Start a CAS session */
cas mysession ;

caslib tera datasource=(srctype="teradata" server="&TD_PRIVATE_IP" database="gelindb"
   schema="gelindb" username="xxx" password="xxx") libref=castera ;

/* List files in Teradata */
proc casutil ;
   list files ;
quit ;

cas mysession terminate ;

Install SAS Embedded Process in Teradata

We have been able so far to integrate SAS Viya and Teradata on Azure. That’s one step to unlock some basic capabilities between SAS and Teradata (data access, optimized data movement, in-database processing, bulk loading/unloading, etc.).

 

But the SAS – Teradata integration is not limited to that. It offers much more. Additional advanced features are enabled with the deployment of the SAS Embedded Process, a lightweight SAS engine, on the Teradata nodes.

 

Here is what you need to do to deploy SAS Embedded Process and its supporting functions in Teradata and to leverage all SAS and Teradata capabilities:

  1. Get installation files
    • Get SAS Embedded Process RPM file from SAS Mirror Manager
    • Get SAS Embedded Process Support Functions from a Teradata representative
    • Package a Quality Knowledge Base (QKB) into an RPM file
  2. Install SAS Embedded Process components
    • Use Teradata Parallel Upgrade Tool (PUT) to install the RPMs (SAS Embedded Process + QKB) on all Teradata nodes
    • Install the SAS Embedded Process Support Functions
    • Install SAS Data Quality procedures

You will have to ssh to the Teradata nodes in order to perform these actions. Thus, a public IP address might have to be configured on the Azure/Teradata side, as well as some firewall rules.

 

To fully leverage a massively parallel multi-machine (CAS nodes in SAS Viya) to multi-machine (Teradata nodes) communication, don’t forget to enable “Data Connector Ports” in the SAS Viya deployment. This is documented in the “SAS Viya Readme” (accessible in “My SAS” portal or in the deployment directory).

 

With the SAS Embedded Process and its supporting functions correctly deployed, SAS users will now be able to:

  • Run SAS Data Quality jobs in Teradata (SAS Data Quality Accelerator for Teradata)
  • Load Teradata data in CAS in a massively parallel mode (SAS Data Connect Accelerator for Teradata)
  • Execute SAS Analytics scoring models / SAS Intelligent Decisioning rule sets/decisions in Teradata (SAS Scoring Accelerator for Teradata)

We'll talk about those nice features in future articles.

 

Thanks for reading.

Version history
Last update:
‎09-30-2021 02:33 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started