With SAS Viya 4 release, Viya(CAS) user can access Azure Synapse SQL-Pool (formerly SQL DW) database table using MS-SQL Server SAS Data Connector. It also enables SAS Viya users to load CAS from various types of data files (parquet, orc, Json, etc..) stored at ADLS2 via Azure Synapse. Users can read these data files into a columnar data table under Azure Synapse before loading them into CAS.
This post is about accessing Azure Synapse SQL Pool (SQL DW) database from SAS Viya 4(CAS) environment.
What is Azure Synapse SQL Pool ?
Azure Synapse SQL Pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. Azure Synapse SQL Pool is the MS-SQL server database running at cloud in distributed mode on n-number of servers/machines.
The MS-SQL Server SAS Data Connector enables Viya(CAS) users to access the Azure Synapse database. A user can use either serial or multi-node data load method to load CAS from the Azure Synapse database table. It also supports BULK data save from CAS table to Azure Synapse SQL Pool table using Azure Storage for intermediate data files. The Bulk load process uploads the large/bulk data into an ADLS2 Blob Container before using the COPY INTO statement to push the data into the Synapse database table.
The following picture describes the SAS Viya 4(CAS) environment access to Azure Synapse SQL Pool (SQL DW) database table. It also describes the data path for BULK data upload from CAS to the Azure Synapse Database table. A user can use either the Serial or Multi-node data load method to load CAS from the Azure Synapse database table.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
To bulk data load to Azure Synapse SQL Pool database from CAS table, a user must have Azure storage location defined with read-write permission. An operational Azure Application enabled with native communication to obtain Azure Access Key at CAS nodes (PODs).
The details steps are discussed in article Cas Accessing Azure Data-lake Files, for configuring Azure user application and Storage Account to access ALDS2 Blob storage.
To bulk load the Azure Synapse database, the MS-SQL server requires access to Azure ADLS2 Blob Storage with read-write permission. You can use Storage Shared Access Signature(SAS) token in the CASLIB statement or configure the MS-SQL server with “--assign-identity” and assign the “Storage Blob Data Contributor” role to the SQL server “principle id” for seamless access to Blob Storage. The following Azure CLI describes the creation of the MS-SQL Server with “--assign-identity”, and role assignment to Server Principle id.
CLI Code:
STUDENT=”Myname”
AKS_MY_LOCATION=”eastus2”
AKS_MY_RESOURCE_GROUP="${STUDENT}viya4"
AKS_MY_USER=viyadep
srv=sqlsrv
sql_server_name=$AKS_MY_RESOURCE_GROUP$srv
sql_server_admin_user=$AKS_MY_USER
sql_server_admin_pwd=”XXXXXXXXX”
sqldw_name=geldwpool
### Create a SQL Server with –assign-identity ####
sql_serv_sts=`az sql server create \
--name $sql_server_name \
--admin-user $sql_server_admin_user \
--admin-password $sql_server_admin_pwd \
--resource-group $AKS_MY_RESOURCE_GROUP \
--location $AKS_MY_LOCATION \
--enable-public-network true \
--assign-identity \
-o tsv `
### Create an Azure Synapse MS-SQL POOL Database in SQL Server ###
sql_serv_dw_sts=`az sql dw create \
--name $sqldw_name \
--server $sql_server_name \
--resource-group $AKS_MY_RESOURCE_GROUP \
--service-objective DW100c \
--tags name=$STUDENT \
--zone-redundant false `
### Assign Storage Blob Data Contributor Role to MS-SQL server managed Identity at Resource group level scope .###
spID=`az resource list -n $sql_server_name --query [*].identity.principalId --out tsv `
rgID=`az group show -g $AKS_MY_RESOURCE_GROUP --query id -o tsv`
server_role_tst=`az role assignment create --assignee $spID --role 'Storage Blob Data Contributor' --scope $rgID`
A customized DSN entry in the odbc.ini file to access the Azure SQL Pool database with hostname, database name, and SSL lib path, etc.
……. ……………… [sqls_gelws] Driver=/opt/sas/viya/home/lib64/accessclients/lib/S0sqls28.so Description=SAS ACCESS to MS SQL Server AEKeyCacheTTL=-1 AEKeystorePrincipalId= AEKeystoreClientSecret= AlternateServers= AlwaysReportTriggerResults=0 AnsiNPW=1 ApplicationIntent=0 ApplicationName= ApplicationUsingThreads=1 AuthenticationMethod=1 BulkBinaryThreshold=32 BulkCharacterThreshold=-1 BulkLoadBatchSize=1024 BulkLoadFieldDelimiter= BulkLoadOptions=2 BulkLoadRecordDelimiter= BulkLoadThreshold=2 ColumnEncryption=Disabled ConnectionReset=0 ConnectionRetryCount=0 ConnectionRetryDelay=3 CryptoProtocolVersion=TLSV1,TLSV1.1,TLSV1.2,SSLv3, SSLv2 Database=geldwpool Domain= EnableBulkLoad=0 EnableQuotedIdentifiers=0 EnableServerSideCursors=1 EnableScrollableCursors=4 EncryptionMethod=1 FailoverGranularity=0 FailoverMode=0 FailoverPreconnect=0 FetchTSWTZasTimestamp=0 FetchTWFSasTime=1 GSSClient=native HostName=utkumaviya4sqlsrv.database.windows.net HostNameInCertificate= #IANAAppCodePage= InitializationString= KeepAlive=0 Language= LoadBalanceTimeout=0 LoadBalancing=0 LoginTimeout=15 LogonID=viyadep MaxPoolSize=100 MinPoolSize=0 MultiSubnetFailover=0 PacketSize=-1 Pooling=0 PortNumber=1433 PRNGSeedFile=/dev/random PRNGSeedSource=0 ProxyHost= ProxyMode=0 ProxyPassword= ProxyPort= ProxyUser= QueryTimeout=0 ReportCodepageConversionErrors=0 SnapshotSerializable=0 TrustStore= TrustStorePassword= ValidateServerCertificate=0 WorkStationID= XMLDescribeType=-10 SSLLibName=/usr/lib64/libssl.so.10 CryptoLibName=/usr/lib64/libcrypto.so.10 ……. ………………
With Viya4 (CAS) environment configured with customized DSN entry in odbc.ini, you can use the following code to read and write CAS data table into Azure Synapse Database. The code is using the serial method to load and save CAS data.
Code:
CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
%let MYUID='viyadep';
%let MYPWD='XXXXXXX';
%let MYDSN='sqls_gelws' ;
%let MYSCHEMA='dbo';
caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN );
/* ## Save CAS data to Azure Synapse pool database table */
proc casutil outcaslib="azsqlws" incaslib="azsqlws";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars" replace;
list files;
quit;
/* ## CAS load from Azure Synapse poll database table. */
proc casutil incaslib="azsqlws" outcaslib="azsqlws";
load casdata="cars" casout="cars_new" replace;
list tables;
quit;
CAS mySession TERMINATE;
Log extract :
........
...............
83
84 caslib azsqlws desc='Microsoft SQL Server Caslib'
85 dataSource=(srctype='sqlserver',
86 username=&MYUID,
87 password=&MYPWD,
88 sqlserver_dsn=&MYDSN );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: Action to ADD caslib AZSQLWS completed for session MYSESSION.
89
78 /* Save CAS data to Azure Synapse pool database table */
79 proc casutil outcaslib="azsqlws" incaslib="azsqlws";
NOTE: The UUID 'f1cbd14f-2b65-6842-b82d-581685eb4072' is connected using session MYSESSION.
80 load data=sashelp.cars casout="cars" replace;
NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement.
NOTE: Executing action 'table.addTable'.
NOTE: Action 'table.addTable' used (Total process time):
NOTE: SASHELP.CARS was successfully added to the "AZSQLWS" caslib as "CARS".
81 save casdata="cars" casout="cars" replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to SQL
NOTE: Cloud Analytic Services saved the file cars in caslib AZSQLWS.
NOTE: Action 'table.save' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 47.475573 seconds.
78 /* CAS load from Azure Synapse poll database table. */
79 proc casutil incaslib="azsqlws" outcaslib="azsqlws";
NOTE: The UUID 'f1cbd14f-2b65-6842-b82d-581685eb4072' is connected using session MYSESSION.
80 load casdata="cars" casout="cars_new" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
The value of EnabledQuotedIdentifiers has been changed to 1
NOTE: Cloud Analytic Services made the external data from cars available as table CARS_NEW in caslib azsqlws.
NOTE: Action 'table.loadTable' used (Total process time):
............
.................
Results :
With Viya4 (CAS) environment configured with customized DSN entry in odbc.ini, the following code can be used to load CAS from the Azure Synapse table using the multi-node method. The CASLIB parameter numreadnodes= and numwritenodes= trigger the multi-node data load and save method. The odbc.ini is centrally located at Azure file share and available to all the CAS nodes (PODs).
Code:
CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
%let MYUID='viyadep';
%let MYPWD='XXXXXXXX';
%let MYDSN='sqls_gelws' ;
%let MYSCHEMA='dbo';
caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN,
numreadnodes=10,
numwritenodes=10
);
proc casutil incaslib="azsqlws" outcaslib="azsqlws";
load casdata="fish_sas" casout="fish_sas" options=(sliceColumn="weight") replace;
list tables;
quit;
CAS mySession TERMINATE;
Log extract :
........
.............
84 caslib azsqlws desc='Microsoft SQL Server Caslib'
85 dataSource=(srctype='sqlserver',
86 username=&MYUID,
87 password=&MYPWD,
88 sqlserver_dsn=&MYDSN
89 numreadnodes=10,numwritenodes=10,
93 );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
NOTE: Action 'table.addCaslib' used (Total process time):
77
78 proc casutil incaslib="azsqlws" outcaslib="azsqlws";
NOTE: The UUID 'e6ff8b32-53b2-7548-8ea8-ae83d9734618' is connected using session MYSESSION.
79 load casdata="fish_sas" casout="fish_sas" options=(sliceColumn="weight") replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
The value of EnabledQuotedIdentifiers has been changed to 1
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(3). The load will proceed with numReadNodes=3.
NOTE: Cloud Analytic Services made the external data from fish_sas available as table FISH_SAS in caslib azsqlws.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE: real time 4.875197 seconds
............
.................
Results :
With SAS Viya 4 (CAS) environment configured with customized DSN entry in odbc.ini, MS-SQL Server configured to access ADLS2 blob storage, and Azure user application configured to access resources using device code method, you can use the following code to BULK load CAS table to Azure Synapse database table. The odbc.ini and Azure Access Key are centrally located at Azure file share and available to all the CAS nodes (Pods).
The CAS table Bulk load to Azure Synapse works in a multi-step process. In the first step, CAS creates an empty table structure at the Azure synapse database using a serial database connection. In the second step, BULK loads the data file from all CAS nodes to ADLS2 storage. In the third step, it submits a COPY INTO statement to the SQL database server to push the data from the ADLS2 storage to the Synapse database table.
Very first-time usage of CASLIB with BULK load statement generates an error message for Azure authentication. The SAS log displays the Microsoft device authentication URL and device code to authenticate and generate Azure Access Key.
To bulk load CAS table with a special character (like “”,’,`,!,#,$,@, etc.) in data elements use “ESCAPE=YES” in the CASLIB statement. The use of this option in the CASLIB statement will slow down the performance as it requires additional scanning/processing of data elements.
To verify the temporary files at the ADLS2 storage location for debugging, use deleteDataFiles=false in the CASLIB statement
Code:
%let MYSTRGACC="utkumaviya4adls2";
%let MYSTRGFS="fsdata";
%let MYFOLDER="bl_data";
%let MYAZUID="MANAGED IDENTITY";
%let MYDNSSUFFIX="dfs.core.windows.net" ;
%let MYTNTID="b1c14d5c-3625-45b3-XXXXXXXXXXXXXXXXXX";
%let MYAPPID="a2e7cfdc-93f8-4f12-XXXXXXXXXXXXXXXXXX";
%let MYUID='viyadep';
%let MYPWD='XXXXXXXXX';
%let MYDSN='sqls_gelws';
%let MYSCHEMA='dbo';
CAS mySession SESSOPTS=(azuretenantid=&MYTNTID CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
Schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN ,
identity=&MYAZUID ,
bulkload='true',
FOLDER=&MYFOLDER,
accountname=&MYSTRGACC,
filesystem=&MYSTRGFS,
dnsSuffix=dfs.core.windows.net,
applicationId=&MYAPPID,
);
/* ## Save CAS data to Azure Synapse pool database table */
proc casutil outcaslib="azsqlws" incaslib="azsqlws";
load data=sashelp.prdsal2 casout="prdsal2" replace;
save casdata="prdsal2" casout="prdsal2" replace;
list files;
quit;
CAS mySession TERMINATE;
Very-first time Error Log:
…………
…………………………..
ERROR: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXXXXXXXX to
authenticate.
…………
…………………………..
Log extract :
.....
.........
93
94 caslib azsqlws desc='Microsoft SQL Server Caslib'
95 dataSource=(srctype='sqlserver',
96 username=&MYUID,
97 password=&MYPWD,
98 Schema=&MYSCHEMA,
99 sqlserver_dsn=&MYDSN ,
100 identity=&MYAZUID ,
101 bulkload='true',
102 FOLDER=&MYFOLDER,
103 accountname=&MYSTRGACC,
104 filesystem=&MYSTRGFS,
105 dnsSuffix=dfs.core.windows.net,
106 applicationId=&MYAPPID
108 );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
....
............
117 /* ## Save CAS data to Azure Synapse pool database table */
118 proc casutil outcaslib="azsqlws" incaslib="azsqlws";
NOTE: The UUID '1b53fcc7-1f8e-1a48-8d22-5e90a0641bb1' is connected using session MYSESSION.
........
................
120 save casdata="prdsal2" casout="prdsal2" replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
The value of EnabledQuotedIdentifiers has been changed to 1
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
The value of EnabledQuotedIdentifiers has been changed to 1
NOTE: Cloud Analytic Services saved the file prdsal2 in caslib AZSQLWS.
NOTE: Action 'table.save' used (Total process time):
..........
.................
Results :
The following screenshot describes the intermediate file staged at Azure ADLS2 Blob Storage before copying it into the Synapse Database table. The intermediate files did not get deleted from the Blob Storage folder by using “deleteDataFiles =false” in the CASLIB statement, by default it’s set to “true”.
The VA application also provide a new source type “Azure Synapse SQL” when creating a new data source connection.
Important Links:
Documents:
Microsoft SQL Server Data Connector
Bulk Loading with Microsoft SQL Server
What is Azure Synapse SQL Pool ?
Articles:
Cas Accessing Azure Data-lake Files
Manage Azure Access Key with AZUREAUTHCACHELOC
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.