The SAS Viya platform supports Single-Sign-On(SSO) authentication access to the Azure DataBricks environment with the CDATA and Databrick JDBC drivers. The SSO is supported for both SPARK LIBNAME and SPARK CASLIB statements against the Databricks database. The LIBNAME and CASLIB statement must include option AUTH=OAUTH2 to use the SSO authenticated connection to DataBricks. The CDATA JDBC driver is distributed with the SAS Viya software deployment and is the default driver for SAS to connect to the DataBricks database.
The SAS Viya platform can be configured with Microsoft Entra ID as an OIDC provider for initial user authentication. The steps to configure the SAS Viya platform for using OIDC with Microsoft Entra ID are listed in SAS documentation.
In this post, I discuss the critical configuration parts of the Entra ID OIDC Application, Databricks workspace, and user permission at DataBricks when using SSO-based access to the Databricks database.
Entra ID OIDC application configuration:
When creating an Entra ID OIDC application for Databricks access, apart from standard API permissions, you must include the “Azure DataBricks” API permission with the user_impersonation delegated role.
The Azure tenant admin must have consented to these roles and the status must be green with text like granted for specific domain/organization.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
Databricks Workspace Configuration:
The PAT (Personal Access Token) Option must be enabled at Databricks Workspace for PAT, SSO, and Service Principal (SP) based access path to work. The user processes connecting to Azure Databricks with SSO or SP authentication using the JDBC driver initiate a PAT token at Databricks instance, even though it’s the SSO or SP connection.
The Databricks admin user can add additional Entra ID users to the Databricks environment for DataBricks workspace, and database tables access.
All Entra ID users added to the DataBricks workspace must be part of the admin group for SSO authentication. Behind the scenes, the OAuth process creates a DataBricks PAT token for SSO users. As per the DataBricks configuration, only admin group users can manage and create PAT tokens at DataBricks.
SSO-based access from SAS Compute Server to Azure DataBricks
When the SAS Viya platform is configured with Entra ID OIDC application to provide initial user authentication and Databricks Workspace is configured with Entra ID user access permissions. SAS users can use SSO-based access from SAS Compute Server to the Azure Databricks database using SPARK LIBNAME engine with AUTH=OAUTH2 option.
The following code describes the SSO-based access from the SAS Compute Server to the Azure Databricks database. Notice that there is no user ID and password but have an option AUTH=OAUTH2 as part of the LIBNAME statement.
Code:
%let MYDBRICKS=adb-3916581250425199.19.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/3916581250425199/0927-154850-ssq0x1nk;
%let MYDRIVERCLASS="cdata.jdbc.databricks.DatabricksDriver";
%let MYCATALOG=hive_metastore;
%let MYSCHEMA=default;
libname CdtSpark spark platform=databricks
driverClass=&MYDRIVERCLASS
auth=oauth2
server="&MYDBRICKS"
database="&MYSCHEMA"
httpPath="&MYHTTPPATH"
port=443
bulkload=no
character_multiplier=1
dbmax_text=50
properties="Catalog=&MYCATALOG;Other=ConnectRetryWaitTime=20;DefaultColumnSize=1024;"
;
Proc SQL outobs=5;
select * from CdtSpark.baseball_prqt ;
run;quit;
data CdtSpark.prdsal2_sas2 ;
set sashelp.prdsal2 ;
run;
Log:
80 %let MYDBRICKS=adb-3916581250425199.19.azuredatabricks.net;
81 %let MYHTTPPATH=sql/protocolv1/o/3916581250425199/0927-154850-ssq0x1nk;
82
83 %let MYDRIVERCLASS="cdata.jdbc.databricks.DatabricksDriver";
84 %let MYCATALOG=hive_metastore;
85 %let MYSCHEMA=default;
86
87 libname CdtSpark spark platform=databricks
88 driverClass=&MYDRIVERCLASS
89 auth=oauth2
90 server="&MYDBRICKS"
91 database="&MYSCHEMA"
92 httpPath="&MYHTTPPATH"
93 port=443
94 bulkload=no
95 character_multiplier=1
96 dbmax_text=50
97 properties="Catalog=&MYCATALOG;Other=ConnectRetryWaitTime=20;DefaultColumnSize=1024;"
98 ;
NOTE: Libref CDTSPARK was successfully assigned as follows:
Engine: SPARK
Physical Name:
jdbc:cdata:databricks:Server=adb-3916581250425199.19.azuredatabricks.net;Database=default;HTTPPath=sql/protocolv1/o/3916581250
425199/0927-154850-ssq0x1nk;QueryPassthrough=true;UseCloudFetch=true;InitiateOAuth=OFF;AuthScheme=AzureAD;OAuthAccessToken=***
***;Catalog=hive_metastore;Other=ConnectRetryWaitTime=20;DefaultColumnSize=1024;
99
100 Proc SQL outobs=5;
101 select * from CdtSpark.baseball_prqt ;
WARNING: Statement terminated early due to OUTOBS=5 option.
102 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 1.
NOTE: PROCEDURE SQL used (Total process time):
real time 1.66 seconds
cpu time 0.07 seconds
103
104 data CdtSpark.prdsal2_sas2 ;
105 set sashelp.prdsal2 ;
106 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set CDTSPARK.PRDSAL2_SAS2 has 23040 observations and 11 variables.
107
Results:
SSO-based access from CAS to Azure DataBricks
When the SAS Viya platform is configured with Entra ID OIDC application to provide initial user authentication and Databricks Workspace is configured with Entra ID user access permissions. The SSO-based access can be used to load CAS from the Azure Databricks database using SPARK CASLIB with the AUTH=OAUTH2 option.
The following code describes the SSO-based access from CAS to the Azure Databricks database. Notice that there is no user ID and password but have an option AUTH=OAUTH2 as part of the CASLIB statement.
Code:
%let MYDBRICKS=adb-3916581250425199.19.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/3916581250425199/0927-154850-ssq0x1nk;
%let MYDRIVERCLASS=cdata.jdbc.databricks.DatabricksDriver;
%let MYCATALOG=hive_metastore;
%let MYSCHEMA=default;
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
/* SSO based CASLIB to Databricks */
caslib Cdtspkcaslib datasource=(srctype='spark',
platform=databricks,
auth=oauth2,
schema="&MYSCHEMA",
server="&MYDBRICKS",
httpPath="&MYHTTPPATH",
driverclass="&MYDRIVERCLASS",
bulkload=no,
port=443,
useSsl=yes,
charMultiplier=1,
dbmaxText=50,
properties="Catalog=&MYCATALOG;DefaultColumnSize=255;Other=ConnectRetryWaitTime=20"
);
/* Load CAS from DataBricks database table */
proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib" ;
load casdata="iot_device" casout="iot_device" replace;
list tables;
quit;
/* Save CAS data to DataBricks database table */
proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars_sas" replace;
list files;
quit;
CAS mySession TERMINATE;
Log:
90 /* SSO based CASLIB to Databricks */
91 caslib Cdtspkcaslib datasource=(srctype='spark',
92 platform=databricks,
93 auth=oauth2,
94 schema="&MYSCHEMA",
95 server="&MYDBRICKS",
96 httpPath="&MYHTTPPATH",
97 driverclass="&MYDRIVERCLASS",
98 bulkload=no,
99 port=443,
100 useSsl=yes,
101 charMultiplier=1,
102 dbmaxText=50,
103 properties="Catalog=&MYCATALOG;DefaultColumnSize=255;Other=ConnectRetryWaitTime=20"
104 );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'CDTSPKCASLIB' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'CDTSPKCASLIB'.
105
106 /* Load CAS from DataBricks database table */
107 proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib" ;
NOTE: The UUID 'c5e703f1-0786-fe4d-b8df-19f813db3445' is connected using session MYSESSION.
108 load casdata="iot_device" casout="iot_device" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services made the external data from iot_device available as table IOT_DEVICE in caslib Cdtspkcaslib.
NOTE: Action 'table.loadTable' used (Total process time):
111
112 /* Save CAS data to DataBricks database table */
113 proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib";
NOTE: The UUID 'c5e703f1-0786-fe4d-b8df-19f813db3445' is connected using session MYSESSION.
114 load data=sashelp.cars casout="cars" replace;
NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement.
NOTE: Executing action 'table.addTable'.
NOTE: Action 'table.addTable' used (Total process time):
NOTE: SASHELP.CARS was successfully added to the "CDTSPKCASLIB" caslib as "CARS".
115 save casdata="cars" casout="cars_sas" replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services saved the file cars_sas in caslib CDTSPKCASLIB.
NOTE: Action 'table.save' used (Total process time):
Results:
Important Links:
Authenticate to Databricks on Microsoft Azure by Using Single Sign-On
Scenario: OIDC with Microsoft Entra ID (Azure Active Directory)
LIBNAME Statement for Spark
Find more articles from SAS Global Enablement and Learning here.
... View more