BookmarkSubscribeRSS Feed

SAS Viya(CAS) access to Azure Synapse SQL Pool (SQL DW)

Started ‎05-07-2021 by
Modified ‎05-07-2021 by
Views 8,169

With SAS Viya 4 release, Viya(CAS) user can access Azure Synapse SQL-Pool (formerly SQL DW) database table using MS-SQL Server SAS Data Connector. It also enables SAS Viya users to load CAS from various types of data files (parquet, orc, Json, etc..) stored at ADLS2 via Azure Synapse. Users can read these data files into a columnar data table under Azure Synapse before loading them into CAS.

 

This post is about accessing Azure Synapse SQL Pool (SQL DW) database from SAS Viya 4(CAS) environment.  

 

What is Azure Synapse SQL Pool ?

Azure Synapse SQL Pool (formerly SQL DW) represents a collection of analytic resources that are provisioned when using Synapse SQL. Azure Synapse SQL Pool is the MS-SQL server database running at cloud in distributed mode on n-number of servers/machines.    

 

The MS-SQL Server SAS Data Connector enables Viya(CAS) users to access the Azure Synapse database. A user can use either serial or multi-node data load method to load CAS from the Azure Synapse database table. It also supports BULK data save from CAS table to Azure Synapse SQL Pool table using Azure Storage for intermediate data files. The Bulk load process uploads the large/bulk data into an ADLS2 Blob Container before using the COPY INTO statement to push the data into the Synapse database table.  

 

Pre-requisites

 

  • SAS Viya 4 with MS-SQL Server Data Connector at CAS Nodes(PODs)
  • Azure File Share mounted to CAS Nodes using AKS Persistence Volume to store and access centrally located ODBC.ini and Azure Access key
  • Customized DSN name in odbc.ini to access Azure Synapse database
  • User access to Azure Synapse database
  •  
    For Bulk Load :
  • User permission to Azure Storage Account with Storage Blob Data Contributor role
  • Azure user application with permission to access Azure Data lake and Azure Storage
  • Azure Access Key at central location available to CAS node to access Blob Storage
  • MS-SQL server configured with Managed Identity or Shared Access Signature (SAS) token to access Blob Storage

 

Data access path

The following picture describes the SAS Viya 4(CAS) environment access to Azure Synapse SQL Pool (SQL DW) database table. It also describes the data path for BULK data upload from CAS to the Azure Synapse Database table. A user can use either the Serial or Multi-node data load method to load CAS from the Azure Synapse database table.  

 

 

ut_1_CAS_Access_to_Synapse_BulkLoad_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Azure Storage Account configuration for BULK Load

To bulk data load to Azure Synapse SQL Pool database from CAS table, a user must have Azure storage location defined with read-write permission. An operational Azure Application enabled with native communication to obtain Azure Access Key at CAS nodes (PODs).  

 

The details steps are discussed in article  Cas Accessing Azure Data-lake Files, for configuring Azure user application and Storage Account to access ALDS2 Blob storage.  

 

Azure MS-SQL Server configuration for BULK Load

To bulk load the Azure Synapse database, the MS-SQL server requires access to Azure ADLS2 Blob Storage with read-write permission. You can use Storage Shared Access Signature(SAS) token in the CASLIB statement or configure the MS-SQL server with “--assign-identity” and assign the “Storage Blob Data Contributor” role to the SQL server “principle id” for seamless access to Blob Storage. The following Azure CLI describes the creation of the MS-SQL Server with “--assign-identity”, and role assignment to Server Principle id.  

 

CLI Code:

 

STUDENT=”Myname” 
AKS_MY_LOCATION=”eastus2”
AKS_MY_RESOURCE_GROUP="${STUDENT}viya4"
AKS_MY_USER=viyadep

srv=sqlsrv
sql_server_name=$AKS_MY_RESOURCE_GROUP$srv
sql_server_admin_user=$AKS_MY_USER
sql_server_admin_pwd=”XXXXXXXXX”
sqldw_name=geldwpool

###  Create a SQL Server with –assign-identity ####
sql_serv_sts=`az sql server create \
--name $sql_server_name \
--admin-user $sql_server_admin_user \
--admin-password $sql_server_admin_pwd \
--resource-group $AKS_MY_RESOURCE_GROUP \
--location $AKS_MY_LOCATION \
--enable-public-network true \
--assign-identity \
-o tsv `

### Create an Azure Synapse MS-SQL POOL  Database in SQL Server ###
sql_serv_dw_sts=`az sql dw create \
--name $sqldw_name \
--server $sql_server_name \
--resource-group $AKS_MY_RESOURCE_GROUP \
--service-objective DW100c \
--tags name=$STUDENT \
--zone-redundant false `

### Assign Storage Blob Data Contributor Role to MS-SQL server managed Identity at Resource group level scope .###

spID=`az resource list -n $sql_server_name --query [*].identity.principalId --out tsv `
rgID=`az group show -g $AKS_MY_RESOURCE_GROUP --query id -o tsv`

server_role_tst=`az role assignment create --assignee $spID --role 'Storage Blob Data Contributor' --scope $rgID`

 

Customized DSN

A customized DSN entry in the odbc.ini file to access the Azure SQL Pool database with hostname, database name, and SSL lib path, etc.  

 

…….
………………
[sqls_gelws]
Driver=/opt/sas/viya/home/lib64/accessclients/lib/S0sqls28.so
Description=SAS ACCESS to MS SQL Server
AEKeyCacheTTL=-1
AEKeystorePrincipalId=
AEKeystoreClientSecret=
AlternateServers=
AlwaysReportTriggerResults=0
AnsiNPW=1
ApplicationIntent=0
ApplicationName=
ApplicationUsingThreads=1
AuthenticationMethod=1
BulkBinaryThreshold=32
BulkCharacterThreshold=-1
BulkLoadBatchSize=1024
BulkLoadFieldDelimiter=
BulkLoadOptions=2
BulkLoadRecordDelimiter=
BulkLoadThreshold=2
ColumnEncryption=Disabled
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
CryptoProtocolVersion=TLSV1,TLSV1.1,TLSV1.2,SSLv3, SSLv2
Database=geldwpool
Domain=
EnableBulkLoad=0
EnableQuotedIdentifiers=0
EnableServerSideCursors=1
EnableScrollableCursors=4
EncryptionMethod=1
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchTSWTZasTimestamp=0
FetchTWFSasTime=1
GSSClient=native
HostName=utkumaviya4sqlsrv.database.windows.net
HostNameInCertificate=
#IANAAppCodePage=
InitializationString=
KeepAlive=0
Language=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=viyadep
MaxPoolSize=100
MinPoolSize=0
MultiSubnetFailover=0
PacketSize=-1
Pooling=0
PortNumber=1433
PRNGSeedFile=/dev/random
PRNGSeedSource=0
ProxyHost=
ProxyMode=0
ProxyPassword=
ProxyPort=
ProxyUser=
QueryTimeout=0
ReportCodepageConversionErrors=0
SnapshotSerializable=0
TrustStore=
TrustStorePassword=
ValidateServerCertificate=0
WorkStationID=
XMLDescribeType=-10
SSLLibName=/usr/lib64/libssl.so.10
CryptoLibName=/usr/lib64/libcrypto.so.10
…….
………………

CAS load/save (serial) from Azure Synapse SQL Pool table

 

With Viya4 (CAS) environment configured with customized DSN entry in odbc.ini, you can use the following code to read and write CAS data table into Azure Synapse Database. The code is using the serial method to load and save CAS data.

 

Code:

 

CAS mySession  SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

%let MYUID='viyadep';
%let MYPWD='XXXXXXX';
%let MYDSN='sqls_gelws' ;
%let MYSCHEMA='dbo';

caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN );


/* ## Save CAS data to Azure Synapse pool database table */
proc casutil outcaslib="azsqlws" incaslib="azsqlws";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars"  replace;
list files;
quit;

/* ## CAS load from Azure Synapse poll database table. */
proc casutil incaslib="azsqlws" outcaslib="azsqlws";
load casdata="cars" casout="cars_new" replace;
list tables;
quit;

CAS mySession  TERMINATE;

 

Log extract :

 

........
...............
83
84   caslib azsqlws desc='Microsoft SQL Server Caslib'
85   dataSource=(srctype='sqlserver',
86   username=&MYUID,
87   password=&MYPWD,
88   sqlserver_dsn=&MYDSN );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: Action to ADD caslib AZSQLWS completed for session MYSESSION.
89


78   /* Save CAS data to Azure Synapse pool database table */
79   proc casutil outcaslib="azsqlws" incaslib="azsqlws";
NOTE: The UUID 'f1cbd14f-2b65-6842-b82d-581685eb4072' is connected using session MYSESSION.
80   load data=sashelp.cars casout="cars" replace;
NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement.
NOTE: Executing action 'table.addTable'.
NOTE: Action 'table.addTable' used (Total process time):
NOTE: SASHELP.CARS was successfully added to the "AZSQLWS" caslib as "CARS".
81   save casdata="cars" casout="cars"  replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to SQL
NOTE: Cloud Analytic Services saved the file cars in caslib AZSQLWS.
NOTE: Action 'table.save' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 47.475573 seconds.



78   /*  CAS load from Azure Synapse poll database table. */
79   proc casutil incaslib="azsqlws" outcaslib="azsqlws";
NOTE: The UUID 'f1cbd14f-2b65-6842-b82d-581685eb4072' is connected using session MYSESSION.
80   load casdata="cars" casout="cars_new" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
         The value of EnabledQuotedIdentifiers has been changed to 1
NOTE: Cloud Analytic Services made the external data from cars available as table CARS_NEW in caslib azsqlws.
NOTE: Action 'table.loadTable' used (Total process time):
............
.................

 

Results :

 ut_2_CAS_Access_to_Synapse_BulkLoad_2.png

 

CAS load/save(multi-node) from Azure Synapse SQL Pool table

With Viya4 (CAS) environment configured with customized DSN entry in odbc.ini, the following code can be used to load CAS from the Azure Synapse table using the multi-node method. The CASLIB parameter numreadnodes= and numwritenodes= trigger the multi-node data load and save method. The odbc.ini is centrally located at Azure file share and available to all the CAS nodes (PODs).  

 

Code:

 

CAS mySession  SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

%let MYUID='viyadep';
%let MYPWD='XXXXXXXX';
%let MYDSN='sqls_gelws' ;
%let MYSCHEMA='dbo';

caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN,
numreadnodes=10,
numwritenodes=10
);

proc casutil incaslib="azsqlws" outcaslib="azsqlws";
load casdata="fish_sas" casout="fish_sas" options=(sliceColumn="weight")  replace;
list tables;
quit;

CAS mySession  TERMINATE;

 

Log extract :

 

........
.............
84   caslib azsqlws desc='Microsoft SQL Server Caslib'
85   dataSource=(srctype='sqlserver',
86   username=&MYUID,
87   password=&MYPWD,
88   sqlserver_dsn=&MYDSN
89   numreadnodes=10,numwritenodes=10,
93   );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
NOTE: Action 'table.addCaslib' used (Total process time):


77
78   proc casutil incaslib="azsqlws" outcaslib="azsqlws";
NOTE: The UUID 'e6ff8b32-53b2-7548-8ea8-ae83d9734618' is connected using session MYSESSION.
79   load casdata="fish_sas" casout="fish_sas" options=(sliceColumn="weight")  replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
         The value of EnabledQuotedIdentifiers has been changed to 1
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(3). The load will proceed with numReadNodes=3.
NOTE: Cloud Analytic Services made the external data from fish_sas available as table FISH_SAS in caslib azsqlws.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               4.875197 seconds
............
.................

 

Results :

ut_3_CAS_Access_to_Synapse_BulkLoad_3.png

 

Bulk load to Azure Synapse SQL Pool table from CAS table

With SAS Viya 4 (CAS) environment configured with customized DSN entry in odbc.ini, MS-SQL Server configured to access ADLS2 blob storage, and Azure user application configured to access resources using device code method, you can use the following code to BULK load CAS table to Azure Synapse database table. The odbc.ini and Azure Access Key are centrally located at Azure file share and available to all the CAS nodes (Pods).  

 

The CAS table Bulk load to Azure Synapse works in a multi-step process. In the first step, CAS creates an empty table structure at the Azure synapse database using a serial database connection. In the second step, BULK loads the data file from all CAS nodes to ADLS2 storage. In the third step, it submits a COPY INTO statement to the SQL database server to push the data from the ADLS2 storage to the Synapse database table.  

 

Very first-time usage of CASLIB with BULK load statement generates an error message for Azure authentication. The SAS log displays the Microsoft device authentication URL and device code to authenticate and generate Azure Access Key.   

 

To bulk load CAS table with a special character (like “”,’,`,!,#,$,@, etc.) in data elements use “ESCAPE=YES” in the CASLIB statement. The use of this option in the CASLIB statement will slow down the performance as it requires additional scanning/processing of data elements.  

 

To verify the temporary files at the ADLS2 storage location for debugging, use deleteDataFiles=false in the CASLIB statement  

 

Code:

%let MYSTRGACC="utkumaviya4adls2";
%let MYSTRGFS="fsdata";
%let MYFOLDER="bl_data";
%let MYAZUID="MANAGED IDENTITY";
%let MYDNSSUFFIX="dfs.core.windows.net" ;
%let MYTNTID="b1c14d5c-3625-45b3-XXXXXXXXXXXXXXXXXX";
%let MYAPPID="a2e7cfdc-93f8-4f12-XXXXXXXXXXXXXXXXXX";

%let MYUID='viyadep';
%let MYPWD='XXXXXXXXX';
%let MYDSN='sqls_gelws';
%let MYSCHEMA='dbo';

CAS mySession  SESSOPTS=(azuretenantid=&MYTNTID CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

caslib azsqlws desc='Microsoft SQL Server Caslib'
dataSource=(srctype='sqlserver',
username=&MYUID,
password=&MYPWD,
Schema=&MYSCHEMA,
sqlserver_dsn=&MYDSN ,
identity=&MYAZUID ,
bulkload='true',
FOLDER=&MYFOLDER,
accountname=&MYSTRGACC,
filesystem=&MYSTRGFS,
dnsSuffix=dfs.core.windows.net,
applicationId=&MYAPPID,
);


/* ## Save CAS data to Azure Synapse pool database table */
proc casutil outcaslib="azsqlws" incaslib="azsqlws";
load data=sashelp.prdsal2 casout="prdsal2" replace;
save casdata="prdsal2" casout="prdsal2"  replace;
list files;
quit;

CAS mySession  TERMINATE;

 

Very-first time Error Log:

 

…………
…………………………..
ERROR: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXXXXXXXX to
       authenticate.
…………
…………………………..

 

Log extract :

 

.....
.........
93
94   caslib azsqlws desc='Microsoft SQL Server Caslib'
95   dataSource=(srctype='sqlserver',
96   username=&MYUID,
97   password=&MYPWD,
98   Schema=&MYSCHEMA,
99   sqlserver_dsn=&MYDSN ,
100  identity=&MYAZUID ,
101  bulkload='true',
102  FOLDER=&MYFOLDER,
103  accountname=&MYSTRGACC,
104  filesystem=&MYSTRGFS,
105  dnsSuffix=dfs.core.windows.net,
106  applicationId=&MYAPPID
108  );
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZSQLWS' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZSQLWS'.
....
............

117  /* ## Save CAS data to Azure Synapse pool database table */
118  proc casutil outcaslib="azsqlws" incaslib="azsqlws";
NOTE: The UUID '1b53fcc7-1f8e-1a48-8d22-5e90a0641bb1' is connected using session MYSESSION.
........
................
120  save casdata="prdsal2" casout="prdsal2"  replace;
NOTE: Executing action 'table.save'.
NOTE: Performing serial SaveTable action using SAS Data Connector to SQL Server.
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
         The value of EnabledQuotedIdentifiers has been changed to 1
WARNING: WARNING: [SAS][ODBC SQL Server Wire Protocol driver]Setting QuotedIdentifier to 'OFF' is not supported with this server.
         The value of EnabledQuotedIdentifiers has been changed to 1
NOTE: Cloud Analytic Services saved the file prdsal2 in caslib AZSQLWS.
NOTE: Action 'table.save' used (Total process time):
..........
.................

 

Results :

 

ut_4_CAS_Access_to_Synapse_BulkLoad_4.png

 

 

The following screenshot describes the intermediate file staged at Azure ADLS2 Blob Storage before copying it into the Synapse Database table. The intermediate files did not get deleted from the Blob Storage folder by using “deleteDataFiles =false” in the CASLIB statement, by default it’s set to “true”.  

 

 

ut_5_CAS_Access_to_Synapse_BulkLoad_5.png

 

User interface option to create Azure Synapse CASLIB

The VA application also provide a new source type “Azure Synapse SQL” when creating a new data source connection.  

 

ut_6_CAS_Access_to_Synapse_BulkLoad_6.png

 

Constraints/Limitations

  • CAS always saves data to a new Synapse table using the BULK load process. The CAS data append to an existing Synapse database table is not supported.
  • As a workaround, the user can BULK load the data into Synapse “Temporary/Staging” table and in a separate step execute an explicit database SQL from to append the data into an existing database table.
  • The Synapse Bulk Load applies to one-way traffic, which means the process uploads data from CAS to Azure Synapse. The Bulk download from Azure Synapse to CAS is not supported.

   

 

Important Links:

 

Documents:

Microsoft SQL Server Data Connector

Bulk Loading with Microsoft SQL Server

What is Azure Synapse SQL Pool ?

 

Articles:

Cas Accessing Azure Data-lake Files

Manage Azure Access Key with AZUREAUTHCACHELOC    

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎05-07-2021 04:49 PM
Updated by:
Contributors

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags