The SAS Viya platform supports access to the Databricks database using the SAS/ACCESS Interface to Spark components. The SAS Viya platform includes the Simba Databricks JDBC driver to connect to the Databricks database. Databricks is a cloud-enabled, unified database to maintain and share enterprise data with various applications.The SAS/ACCESS interface to Spark contains two main components. The Spark LIBNAME statement connection enables the user to connect to Spark-compatible data sources from the SAS Compute Server. The Spark data connector enables the user to connect Spark-compatible data sources from CAS.
In this post, I discuss the database access from SAS Compute Server to Databricks using the Simba JDBC driver.
Databricks is a Spark data platform that runs in a hosted cloud environment, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. The SAS ACCESS/Interface to Spark supports the following versions of Databricks.
The SAS/ACCESS interface to Spark supports user ID and password-based authentication for the Databricks database. The Single-Sign-On access to Azure Databricks is supported when the SAS Viya platform is configured with the Azure Entra ID.
The Simba Databricks JDBC Data Connector uses the Apache Arrow library to efficiently transfer result sets from Databricks to the client. The Arrow library requires the JRE option JREOPTION --add-opens=java.base/java.nio=ALL-UNNAMED to be passed to the SAS Java Runtime Environment at startup. This property must be set in the SAS Studio compute context, and the SAS batch server context. A configuration step that must be performed in the SAS Environment Manager.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
With the Databricks JDBC Simba driver, some option names have changed compared to the CDATA JDBC driver. With Azure Databricks Workspace, SPARK Cluster, database table, and JDBC driver in place, you can use the following code to read and write data from the Azure Databricks table. The Simba JDBC Spark driver is used by the SAS/ACCESS Interface to SPARK to connect to the Databricks database. The Azure Databricks Workspace token (key) is used as the password to authenticate to the environment. To access the Unity catalog from the database, use the ConnCatalog=;and ConnSchema= parameters in URL option.
Code:
%let MYDBRICKS=adb-2814985338120447.7.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/2814985338120447/1114-182334-mabjpu68;
%let MYPWD=dapi16d43054a42df2b7ef4d197d6afdaeff-3;
%let MYCATALOG=ws_acorn_p41691_viya4_data;
%let MYUID=token;
%let MYDRIVERCLASS="com.simba.databricks.jdbc.Driver";
%let MYSCHEMA=default;
libname dbricks spark platform=databricks
driverClass=&MYDRIVERCLASS
user=&MYUID
password="&MYPWD"
schema=&MYSCHEMA
bulkload=no
character_multiplier=1
dbmax_text=50
PRESERVE_TAB_NAMES=YES
url="jdbc:databricks://&MYDBRICKS:443/;transportMode=http;ssl=1;AuthMech=3;httppath=&MYHTTPPATH;defaultStringColumnLength=255;useNativeQuery=1;TemporarilyUnavailableRetryTimeout=1800;ConnCatalog=&MYCATALOG;ConnSchema=&MYSCHEMA"
;
Proc SQL outobs=10;
select * from dbricks.iot_device ;
run;quit;
data dbricks.prdsal2_sas1;
set sashelp.prdsal2 ;
run;
Log:
98
99 libname dbricks spark platform=databricks
100 driverClass=&MYDRIVERCLASS
101 user=&MYUID
102 password="&MYPWD"
103 schema=&MYSCHEMA
104 bulkload=no
105 character_multiplier=1
106 dbmax_text=50
107 PRESERVE_TAB_NAMES=YES
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
108 url="jdbc:databricks://&MYDBRICKS:443/;transportMode=http;ssl=1;AuthMech=3;httppath=&MYHTTPPATH;defaultStringColumnLength=255;
108! useNativeQuery=1;TemporarilyUnavailableRetryTimeout=1800;ConnCatalog=&MYCATALOG;ConnSchema=&MYSCHEMA"
109 ;
NOTE: Libref DBRICKS was successfully assigned as follows:
Engine: SPARK
Physical Name: jdbc:databricks://adb-985508874615261.1.azuredatabricks.net:443/
110
111
112 Proc SQL outobs=10;
113 select * from dbricks.iot_device ;
WARNING: Statement terminated early due to OUTOBS=10 option.
114 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 3.
NOTE: PROCEDURE SQL used (Total process time):
real time 2.36 seconds
cpu time 0.08 seconds
115
116 data dbricks.prdsal2_sas1;
117 set sashelp.prdsal2 ;
118 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set DBRICKS.prdsal2_sas1 has 23040 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 1:02.65
cpu time 10.48 seconds
The Single-Sign-On access to Azure Databricks is supported using the Simba JDBC driver when the SAS Viya platform is configured with the Azure Entra ID. The LIBNAME statement must include option AUTH=OAUTH2 to use the SSO authenticated connection to DataBricks.
The SAS Viya platform can be configured with Microsoft Entra ID as an OIDC provider for initial user authentication. When creating an Entra ID OIDC application for Databricks access, apart from standard API permissions, you must include the “Azure DataBricks” API permission with the user_impersonation delegated role. The Azure tenant admin must have consented to these roles, and the status must be green with text like granted for specific domain/organization.
When the SAS Viya platform is configured with Entra ID OIDC application to provide initial user authentication and Databricks Workspace is configured with Entra ID user access permissions. SAS users can use SSO-based access from SAS Compute Server to the Azure Databricks database using SPARK LIBNAME engine with AUTH=OAUTH2 option.
The following code describes the SSO-based access from the SAS Compute Server to the Azure Databricks database. Notice that there is no user ID and password in the LIBNAME statement. If you use the User-ID and Password option in LIBNAME statement then the SSO authentication path is not followed.
Code:
%let MYDBRICKS=adb-985508874615261.1.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/985508874615261/1120-220536-9iqbz6rl;
%let MYCATALOG=ws_cloak_p41844_rg;
%let MYDRIVERCLASS="com.simba.databricks.jdbc.Driver";
%let MYSCHEMA=default;
libname dbricks spark platform=databricks
driverClass=&MYDRIVERCLASS
auth=oauth2
schema=&MYSCHEMA
bulkload=no
character_multiplier=1
dbmax_text=50
PRESERVE_TAB_NAMES=YES
url="jdbc:databricks://&MYDBRICKS:443/;transportMode=http;ssl=1;AuthMech=11;Auth_Flow=0;httppath=&MYHTTPPATH;defaultStringColumnLength=255;useNativeQuery=1;TemporarilyUnavailableRetryTimeout=1800;ConnCatalog=&MYCATALOG;ConnSchema=&MYSCHEMA"
;
Proc SQL outobs=10;
select * from dbricks.iot_device ;
run;quit;
data dbricks.prdsal2_sas2;
set sashelp.prdsal2 ;
run;
Log:
97
98 libname dbricks spark platform=databricks
99 driverClass=&MYDRIVERCLASS
100 auth=oauth2
101 schema=&MYSCHEMA
102 bulkload=no
103 character_multiplier=1
104 dbmax_text=50
105 PRESERVE_TAB_NAMES=YES
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
106 url="jdbc:databricks://&MYDBRICKS:443/;transportMode=http;ssl=1;AuthMech=11;Auth_Flow=0;httppath=&MYHTTPPATH;
106! defaultStringColumnLength=255;useNativeQuery=1;TemporarilyUnavailableRetryTimeout=1800;ConnCatalog=&MYCATALOG;
106! ConnSchema=&MYSCHEMA"
107 ;
NOTE: Libref DBRICKS was successfully assigned as follows:
Engine: SPARK
Physical Name:
jdbc:databricks://adb-985508874615261.1.azuredatabricks.net:443/default;transportMode=http;ssl=1;AuthMech=11;Auth_Flow=0;httpp
ath=sql/protocolv1/o/985508874615261/1120-220536-9iqbz6rl;defaultStringColumnLength=255;useNativeQuery=1;TemporarilyUnavailabl
eRetryTimeout=1800;ConnCatalog=ws_cloak_p41844_rg;ConnSchema=default;Auth_AccessToken=XXXXXX
108
109 Proc SQL outobs=10;
110 select * from dbricks.iot_device ;
WARNING: Statement terminated early due to OUTOBS=10 option.
111 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 1.
NOTE: PROCEDURE SQL used (Total process time):
real time 3.61 seconds
cpu time 0.08 seconds
112
113 data dbricks.prdsal2_sas2;
114 set sashelp.prdsal2 ;
115 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set DBRICKS.prdsal2_sas2 has 23040 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 1:36.16
cpu time 10.63 seconds
Important Links:
Authenticate to Databricks on Microsoft Azure by Using Single Sign-On
Scenario: OIDC with Microsoft Entra ID (Azure Active Directory)
Find more articles from SAS Global Enablement and Learning here.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.