In the new release of Viya, it is certainly not a surprise to say that there is a lot of new capabilities. Huh? Let’s briefly highlight some of the new options offered regarding the access of data from SAS Cloud Analytic Services (CAS).
First new big item on this space is the ability to load data in CAS from an Amazon Web Services (AWS) S3 location. Write-back is also possible. Supported file formats are SASHDAT and CSV.
It adds a new option to the Platform Data Sources already available (PATH, HDFS, DNFS) that allow to persist data in the SASHDAT and CSV formats, and it enables to leverage a cloud-based storage service. It supports parallel reads and writes of the data on a MPP CAS infrastructure.
CAS access to S3 data is done through the use of a CASLIB. Here is an example of how to create a CASLIB that points to S3:
caslib mys3 datasource=(
srctype="s3",
bucket="mybucket",
awsConfigPath="/home/myuser/.aws/config",
awsCredentialsPath="/home/myuser/.aws/credentials",
region="US_East",
objectPath="/mypath/"
) ;
Second theme in this area is the coming availability of new Data Connectors and Data Connect Accelerators:
Regarding Spark, as mentioned, it is planned for Limited Availability, meaning that SAS Product Management must be involved.
JDBC is a good alternative to ODBC for generic access to lots of databases. It is much easier to setup than ODBC. If your CAS environment has the correct setup regarding JAVA, then you only need:
Some examples with PostgreSQL and Oracle:
caslib jdbcpg desc='JDBC PostgreSQL'
dataSource=(srctype='jdbc',
url="jdbc:postgresql://mydb.example.com:5432/casdm",
authenticationdomain="PGAuth",
schema="public",
class="org.postgresql.Driver",
classpath="/opt/sas/jdbc") ;
caslib jdbcora desc='JDBC Oracle'
dataSource=(srctype='jdbc',
url="jdbc:oracle:thin:@//mydb.example.com:1521/XE",
authenticationdomain="ORAAuth",
schema="HR",
class="oracle.jdbc.OracleDriver",
classpath="/opt/sas/jdbc") ;
Last point in this topic is the new FedSQL explicit pass-through facility. Viya 3.3 introduced implicit pass-through. Now it’s time for explicit pass-through.
Explicit pass-through enables users to send SQL queries directly to the database for execution. In the CAS context, it means that you will be able to run a specific SQL query inside the database, using the database syntax, and get the results back directly loaded into CAS.
Here is an example:
caslib PG datasource=(srctype="postgres",authenticationdomain="PGAuth",
server="mydb.example.com",database="dvdrental",schema="public") libref=PG ;
caslib ORA datasource=(srctype="oracle",authenticationdomain="ORAAuth",
path="//mydb.example.com:1521/xe",schema="CASDM") libref=ORA ;
proc fedsql sessref=mySession _method ;
create table PG.myresults{options replace=true} as
select film.title
from connection to PG(select "title", "film_id" from "film") as film,
connection to ORA(select "film_id" from "film_category" where "category_id"=2) as film_category
where film.film_id=film_category.film_id ;
quit ;
This simple example shows how to send a query directly to PostgreSQL and Oracle and get the results joined in CAS.
Explicit pass-through is triggered by the CONNECTION TO clause, is supported with every database CAS supports. The SQL enclosed in the CONNECTION TO clause is a SELECT-type query and must produce a result set that will be routed back to CAS. A Data Definition Language (DDL) statement will not work.
That’s all folks. Thanks for reading.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.