SAS Communities Library

We’re smarter together. Learn from this collection of community knowledge and add your expertise.
BookmarkSubscribeRSS Feed

CAS Accessing Azure DataBricks SPARK cluster

Started ‎11-29-2021 by
Modified ‎11-29-2021 by
Views 6,267

With SAS Viya 2021.2 LTS release(Stable 2021.1.5/6), Viya users can access the Azure Databricks SPARK (Ver 3 onwards) cluster using the SPARK data connector. The SPARK Data connector enables the user to read and write data tables to the Azure Databricks database. The SPARK Data connector supports serial data transfer between CAS and Azure Databricks database.

 

What is Azure Databricks?

 

Azure Databricks is optimized for Azure data lakes and provides an interactive workspace to set up the environment and collaborate amongst the data scientists. Azure Databricks uses the SPARK engine to process the data.

 

This article is about CAS accessing the Azure Databricks SPARK (Ver 3) database table using the SPARK data connector.

 

Pre-requisites

  • SAS SPARK Data Connector at CAS controller (comes with Viya deployment).
  • Access token from Azure Databricks Workspace.
  • Azure Databricks with SPARK (Ver 3..) cluster.
  • Databricks JDBC Jar file available at CAS controller.
  • SPARK (Ver 3) cluster and JDBC URL information.

 

Data access path

The following picture describes the SAS Viya(CAS) environment access to the Azure Databricks database table.

 

ut_1_CAS_Accesing_SPARK_AzureDataBricks_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Azure Databricks workspace Token

The Databricks workspace is the entry point for external applications to access the objects and data from the Databricks SPARK cluster. The Databricks workspace user credential is required to connect to the SPARK cluster from an external application.

 

The following screen describes the user credential (Token) to access the SPARK cluster.

 

uk_2_CAS_Accesing_SPARK_AzureDataBricks_2.png

 

Azure Databricks SPARK 3 cluster

The SPARK data connector supports the data access from a cluster started with SPARK Ver 3 and up. The following screen describes the creation of the SPARK 3 cluster under Azure Databricks Workspace.

 

 

uk_3_CAS_Accesing_SPARK_AzureDataBricks_3.png

 

Azure Databricks SPARK cluster connection information is available at the cluster configuration tab.

 

uk_4_CAS_Accesing_SPARK_AzureDataBricks_4.png

 

Azure Databricks JDBC driver

The third-party application can access to Databricks table using the JDBC driver. The JDBC Driver is available at the following link.

 

Databricks JDBC Driver download

CAS Access to Azure Databricks SPARK cluster

With Azure Databricks Workspace Token, SPARK-3 Cluster, and Databricks JDBC driver in place, you can use the following code for serial load/save to CAS from the Azure Databricks table. The Azure Databricks Workspace token (key) is used as the password to authenticate to the Databricks environment.

 

Code:

 

/* Note : variable value in quotes generate errors, So keep it without quotes. */
%let MYDBRICKS=adb-7060859955656306.6.azuredatabricks.net;
%let MYPWD=dapiaa66843abadb51775a9dd7858d6980aa-2;
%let MYHTTPPATH=sql/protocolv1/o/7060859955656306/0210-155120-shop163;
%let MYUID=token;

CAS mySession  SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

caslib spkcaslib dataSource=(srctype='spark',
           url="jdbc:spark://&MYDBRICKS:443/default;transportMode=http;ssl=1;httpPath=&MYHTTPPATH;AuthMech=3;UID=&MYUID;PWD=&MYPWD"
           driverclass="com.simba.spark.jdbc.Driver",
           classpath="/mnt/myazurevol/config/access-clients/JDBC",
           BULKLOAD=NO,
           schema="default" );

proc casutil outcaslib="spkcaslib" incaslib="spkcaslib" ;
list files ;
quit;

/* Load CAS from DataBricks database table */
proc casutil outcaslib="spkcaslib" incaslib="spkcaslib" ;
	load casdata="iot_device" casout="iot_device" replace;
    list tables;
quit;

/* Save CAS table to DataBricks database */
proc casutil outcaslib="spkcaslib" incaslib="spkcaslib";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars_sas"  replace;
list files;
quit;

CAS mySession  TERMINATE;

 

Log extract :

 

....
..............
88   caslib spkcaslib dataSource=(srctype='spark',
89              url="jdbc:spark://&MYDBRICKS:443/default;transportMode=http;ssl=1;httpPath=&MYHTTPPATH;AuthMech=3;UID=&MYUID;
89 ! PWD=&MYPWD"
90              driverclass="com.simba.spark.jdbc.Driver",
91              classpath="/mnt/myazurevol/config/access-clients/JDBC",
92              BULKLOAD=NO,
93              schema="default" );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'SPKCASLIB' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'SPKCASLIB'.
....
..............
99   /* Load CAS from DataBricks database table */
100  proc casutil outcaslib="spkcaslib" incaslib="spkcaslib" ;
NOTE: The UUID '4eaf8859-dc77-ee40-8259-41e1b897e44f' is connected using session MYSESSION.
101
101!  load casdata="iot_device" casout="iot_device" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services made the external data from iot_device available as table IOT_DEVICE in caslib spkcaslib.
....
..............
105  /* Save CAS data to DataBricks database */
106  proc casutil outcaslib="spkcaslib" incaslib="spkcaslib";
NOTE: The UUID '4eaf8859-dc77-ee40-8259-41e1b897e44f' is connected using session MYSESSION.
107  load data=sashelp.cars casout="cars" replace;
NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement.
NOTE: Executing action 'table.addTable'.
NOTE: SASHELP.CARS was successfully added to the "SPKCASLIB" caslib as "CARS".
108  save casdata="cars" casout="cars_sas"  replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services saved the file cars_sas in caslib SPKCASLIB.
....
..............
.................

 

Result Output:

 

CAS table saved to Azure Databricks database.

 

uk_5_CAS_Accesing_SPARK_AzureDataBricks_5.png

 

 

Important Links:

What is Azure Databricks ?

Accessing Azure Databricks from SAS Viya (CAS)

Accessing Azure Databricks from SAS 9.4

  

Find more articles from SAS Global Enablement and Learning here.

Comments

Hi 

I'm not sure if Databricks driver has been changed but when tested I got error when trying to list files from the first code. I'm using JDBC42-2.6.32 driver.

The working CASLIB statement in my case is as follows:

CASLIB dbspark dataSource=(srctype='spark',
url="jdbc:databricks://ServerHostname:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/8982188031797693/0201-012252-gaq83b6a;AuthMech=3;UID=token;PWD=AccessToken",
driverclass="com.databricks.client.jdbc.Driver",
classpath="/data-drivers/jdbc/databricks",BULKLOAD=NO,schema="default" );

Regards

Zbyszek

Version history
Last update:
‎11-29-2021 07:12 PM
Updated by:
Contributors

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags