The SAS Viya platform supports the Service Principal authentication-based access to the Azure DataBricks database with the CDATA JDBC drivers. This access is available for both SPARK LIBNAME and SPARK CASLIB statements against the Azure Databricks database. To access using Service Principal, the SAS user is required to include the Azure environment details, Databricks properties, and the Service Principle credential in the properties= option of the LIBNAME and CASLIB statement.
In this post, I discuss the configuration parts of the Azure Service Principal, Databricks workspace, and Service Principal permission at DataBricks for accessing the Azure Databricks database.
The CDATA JDBC driver is distributed with the SAS Viya software deployment and is the default driver for SAS users to connect to the DataBricks database. With the CDATA JDBC connection string properties, SAS users can provide the Azure TenantID, SubscriptionID, ResourceGroup, DatabricksWorksSpace, ClientID, ClientSecrete, and AuthScheme. The JDBC connection will acquire an access token from the Databricks environment using the connection properties when AuthScheme is service principal. The Azure Service Principal (ClientId) must have a valid secret and the Databricks API permission as part of the configuration.
The CDATA documentation to access the Azure DataBricks using Service Principal.
Azure Service Principle (application) configuration:
When creating an Entra ID Service Principle (application) for Databricks access, apart from standard API permissions, you must include the “Azure DataBricks” API permission with the user_impersonation delegated role.
The Azure tenant admin must have consented to these roles and the status must be green with text like granted for specific domain/organization.
Select any image to see a larger version. Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.
The Azure Service Principle (ClientId) must have a valid secret.
Databricks Workspace Configuration:
The PAT (Personal Access Token) Option must be enabled at Databricks Workspace for PAT, SSO, and Service Principal (SP) based access path to work. The user processes connecting to Azure Databricks with SSO or SP authentication using the JDBC driver initiate a PAT token at Databricks instance, even though it’s the SSO or SP based connection.
Add an Entra ID Service Principle(application) to the Databricks environment for DataBricks workspace, and database tables access.
Use the “Microsoft Entra ID Managed” option while adding and provide the application ID from Azure Service Principle (application) configuration steps. Under Entitlement section, select the “Databricks SQL Access” and “Allow Workspace access” with status active.
Grant access privileges to user (geldm) to use the Service principle. The users or groups who need to use the service principle must be granted with “Service Principle: user” privilege. The user with “Service Principle: manager” privilege, by default does not get ability to use it. The “Service Principle: user” privilege must be granted to the user (geldm) to use the Service Principle.
At the Service Principle details, under the Permission tab, you can add and update the additional privileges to users with the required privileges and permissions.
The Entra ID Service Principal (application) added to the DataBricks workspace must be part of the admin group for Service Principle-based authentication. Behind the scenes, the authentication process creates a DataBricks PAT token for SSO or Service Principal users. As per the DataBricks configuration, only admin group users can manage and create PAT tokens at DataBricks.
Service Principal based access from SAS Compute Server to Azure DataBricks
When the Service Principal is configured with the required API permission and a valid credential. The DataBricks Workspace is included with a valid Service principle and permissions. The SAS users can use the Service Principal in properites= option of the LIBNAME statement to access the Azure Databricks database using SPARK LIBNAME engine.
The LIBNAME with the Service Principal is supported with the CDATA JDBC driver. The driver is distributed with the SAS Viya software deployment and is the default driver for SAS users to connect to the DataBricks database.
The following code describes the Service Principal based access from the SAS Compute Server to the Azure Databricks database. Notice that there is no user ID and password but have Service Principal, credential, workspace name, and other information in the properties= option of the LIBNAME statement.
Code:
%let MYRSGRP=clove-r-0302-viya4-data ;
%let MYAZWS=ws-clove-r-0302-viya4-data;
%let MYDBRICKS=adb-3290662697234059.19.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/3290662697234059/0916-145442-7dn19gjc;
%let MYTNTID=a708fb09-1d96-416a-ad34-72fa07ff196d;
%let MYSUBID=d588dad0-2004-4b14-a34e-6bf0519e32e4;
%let MYCLNT_ID=49200cb0-a10c-420e-8ba5-6d06ee6ac1f6;
/* AppReg client secret */
%let MYCLNT_SECRET=H8N8Q~XXXXXX~XXXXXXX;
%let MYCATALOG=hive_metastore;
%let MYSCHEMA=default;
/* Azure Service Principal based LIBNAME to Databricks */
libname spkdb spark platform=databricks
server="&MYDBRICKS"
database="&MYSCHEMA"
port=443
httppath="&MYHTTPPATH" properties="Catalog=&MYCATALOG;AuthScheme=AzureServicePrincipal;AzureTenantId=&MYTNTID;AzureClientId=&MYCLNT_ID;AzureClientSecret=&MYCLNT_SECRET;AzureSubscriptionId=&MYSUBID;AzureResourceGroup=&MYRSGRP;AzureWorkspace=&MYAZWS;QueryPassthrough=true;DefaultColumnSize=1024;Other=ConnectRetryWaitTime=20;"
bulkload=no
character_multiplier=1
dbmax_text=50;
Proc SQL outobs=5;
select * from spkdb.baseball_prqt ;
run;
quit;
data spkdb.prdsal2_sas3 ;
set sashelp.prdsal2 ;
run;
Log:
.........
.....
82 %let MYRSGRP=birch-p03073-viya4-data ;
83 %let MYAZWS=ws-birch-p03073-viya4-data;
84 %let MYDBRICKS=adb-3916581250425199.19.azuredatabricks.net;
85 %let MYHTTPPATH=sql/protocolv1/o/3916581250425199/0927-154850-ssq0x1nk;
86
87
98 /* Azure Service Principal based LIBNAME to Databricks */
99 libname spkdb spark platform=databricks
100 server="&MYDBRICKS"
101 database="&MYSCHEMA"
102 port=443
103 httppath="&MYHTTPPATH"
104 properties="Catalog=&MYCATALOG;AuthScheme=AzureServicePrincipal;AzureTenantId=&MYTNTID;AzureClientId=&MYCLNT_ID;
104! AzureClientSecret=&MYCLNT_SECRET;AzureSubscriptionId=&MYSUBID;AzureResourceGroup=&MYRSGRP;AzureWorkspace=&MYAZWS;
104! QueryPassthrough=true;DefaultColumnSize=1024;Other=ConnectRetryWaitTime=20;"
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
105 bulkload=no
106 character_multiplier=1
107 dbmax_text=50;
NOTE: Libref SPKDB was successfully assigned as follows:
Engine: SPARK
Physical Name:
jdbc:cdata:databricks:Server=adb-3916581250425199.19.azuredatabricks.net;Database=default;HTTPPath=sql/protocolv1/o/3916581250
425199/0927-154850-ssq0x1nk;QueryPassthrough=true;UseCloudFetch=true;Catalog=hive_metastore;AuthScheme=AzureServicePrincipal;A
zureTenantId=a708fb09-1d96-416a-ad34-72fa07ff196d;AzureClientId=49200cb0-a10c-420e-8ba5-6d06ee6ac1f6;AzureClientSecret=H8N8Q~v
2KTT1VCbjXXXXXXXXXX;AzureSubscriptionId=d588dad0-2004-4b14-a34e-6bf0519e32e4;AzureResourceGroup=birch-p03073-viy
a4-data;AzureWorkspace=ws-birch-p03073-viya4-data;QueryPassthrough=true;DefaultColumnSize=1024;Other=ConnectRetryWaitTime=20;
108
109 Proc SQL outobs=5;
110 select * from spkdb.baseball_prqt ;
WARNING: Statement terminated early due to OUTOBS=5 option.
111 run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
112 quit;
NOTE: The PROCEDURE SQL printed page 3.
NOTE: PROCEDURE SQL used (Total process time):
real time 1.40 seconds
cpu time 0.07 seconds
113
114 data spkdb.prdsal2_sas3 ;
115 set sashelp.prdsal2 ;
116 run;
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set SPKDB.PRDSAL2_SAS3 has 23040 observations and 11 variables.
NOTE: DATA statement used (Total process time):
....
...........
Results:
Service Principal based access from CAS to Azure DataBricks
When the Service Principal is configured with the required API permission and a valid credential. The DataBricks Workspace is included with a valid Service principle and permissions. The SAS users can use the Service Principal in properites= option of the CASLIB statement to access and load CAS from the Azure Databricks database.
The CASLIB with the Service Principal is supported with the CDATA JDBC driver. The driver is distributed with the SAS Viya software deployment and is the default driver for SAS users to connect to the DataBricks database.
The following code describes the Service Principal based access from the CAS to the Azure Databricks database. Notice that there is no user ID and password but have Service Principal, credential, workspace name, and other information in the properties= option of the CASLIB statement.
Code:
%let MYRSGRP=clove-r-0302-viya4-data ;
%let MYAZWS=ws-clove-r-0302-viya4-data;
%let MYDBRICKS=adb-3290662697234059.19.azuredatabricks.net;
%let MYHTTPPATH=sql/protocolv1/o/3290662697234059/0916-145442-7dn19gjc;
%let MYTNTID=a708fb09-1d96-416a-ad34-72fa07ff196d;
%let MYSUBID=d588dad0-2004-4b14-a34e-6bf0519e32e4;
%let MYCLNT_ID=49200cb0-a10c-420e-8ba5-6d06ee6ac1f6;
/* AppsReg client secret */
%let MYCLNT_SECRET=H8N8Q~XXXXXXXXXXXXXXXXXXX;
%let MYCATALOG=hive_metastore;
%let MYSCHEMA=default;
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
/* Azure Service Principal based CASLIB to Databricks */
caslib Cdtspkcaslib datasource=(srctype='spark',
platform=databricks,
schema="&MYSCHEMA",
server="&MYDBRICKS",
httpPath="&MYHTTPPATH",
bulkload=no,
port=443,
properties="Catalog=&MYCATALOG;AuthScheme=AzureServicePrincipal;AzureTenantId=&MYTNTID;AzureClientId=&MYCLNT_ID;AzureClientSecret=&MYCLNT_SECRET;AzureSubscriptionId=&MYSUBID;AzureResourceGroup=&MYRSGRP;AzureWorkspace=&MYAZWS;QueryPassthrough=true;DefaultColumnSize=1024;Other=ConnectRetryWaitTime=20;"
) libref=dbxlib ;
/* Save CAS table to DataBricks database table */
proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars_sas" replace;
list files;
quit;
/* Load CAS from DataBricks database table */
proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib" ;
load casdata="cars_sas" casout="cars_sas" replace;
list tables;
quit;
CAS mySession TERMINATE;
Log:
...
.......
97 /* Azure Service Principal based CASLIB to Databricks */
98 caslib Cdtspkcaslib datasource=(srctype='spark',
99 platform=databricks,
100 schema="&MYSCHEMA",
101 server="&MYDBRICKS",
102 httpPath="&MYHTTPPATH",
103 bulkload=no,
104 port=443,
105 properties="Catalog=&MYCATALOG;AuthScheme=AzureServicePrincipal;AzureTenantId=&MYTNTID;AzureClientId=&MYCLNT_ID;
105! AzureClientSecret=&MYCLNT_SECRET;AzureSubscriptionId=&MYSUBID;AzureResourceGroup=&MYRSGRP;AzureWorkspace=&MYAZWS;
105! QueryPassthrough=true;DefaultColumnSize=1024;Other=ConnectRetryWaitTime=20;"
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
106 ) libref=dbxlib ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'CDTSPKCASLIB' is now the active caslib.
….
….
….
107
108 /* Save CAS table to DataBricks database table */
109 proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib";
NOTE: The UUID '70d1e97b-dd79-564d-ae5b-91243d85c31e' is connected using session MYSESSION.
110 load data=sashelp.cars casout="cars" replace;
NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement.
NOTE: Executing action 'table.addTable'.
NOTE: Action 'table.addTable' used (Total process time):
NOTE: real time 0.017585 seconds
NOTE: cpu time 0.029144 seconds (165.73%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 3.51M (0.00%)
NOTE: bytes moved 68.08K
NOTE: SASHELP.CARS was successfully added to the "CDTSPKCASLIB" caslib as "CARS".
111 save casdata="cars" casout="cars_sas" replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services saved the file cars_sas in caslib CDTSPKCASLIB.
NOTE: Action 'table.save' used (Total process time):
NOTE: real time 17.310049 seconds
NOTE: cpu time 0.129744 seconds (0.75%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 7.41M (0.01%)
NOTE: The Cloud Analytic Services server processed the request in 17.310049 seconds.
112 list files;
NOTE: Executing action 'table.caslibInfo'.
……
……
…..
115 /* Load CAS from DataBricks database table */
116 proc casutil outcaslib="Cdtspkcaslib" incaslib="Cdtspkcaslib" ;
NOTE: The UUID '70d1e97b-dd79-564d-ae5b-91243d85c31e' is connected using session MYSESSION.
117 load casdata="cars_sas" casout="cars_sas" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to Spark.
NOTE: Cloud Analytic Services made the external data from cars_sas available as table CARS_SAS in caslib Cdtspkcaslib.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE: real time 3.580677 seconds
NOTE: cpu time 0.072706 seconds (2.03%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 8.63M (0.01%)
NOTE: bytes moved 127.68K
NOTE: The Cloud Analytic Services server processed the request in 3.580677 seconds.
118 list tables;
…
..
Results:
Important Links: CDATA document - access the Azure DataBricks using Service Principal
... View more