Have you come across this question: How to access XLSX data files from an on-premises data center to a cloud-hosted SAS Viya environment without making a copy to cloud storage?
If the XLSX files are stored on Network File System (NFS) in the on-premises data center, the Remote Data Agent can access them using an ODBC driver connection. With the SAS Viya Cloud Data Exchange (CDE), the CDE administrator can deploy and configure the Remote Data Agent with an ODBC driver on an on-premises server to access the CSV, XLSX data files.
In this post, I discuss the deployment and configuration of the XLSX ODBC driver data source at the Remote Data agent.
The XLSX ODBC driver is not included with the SAS Remote Data Agent deployment. You need to purchase and download the XLSX ODBC driver from a supported vendor. The CDE administrator needs to deploy the XLSX ODBC driver and prepare the odbc.ini with ODBC connection details. The ODBC driver and ODBC configuration file must be placed at a location mounted to the Remote Data Agent container. A SAS administrator can configure the ODBC data source at the Remote Data Agent server.
The Remote Data Agent is a containerized software that runs with Docker runtime. The CDE administrator can mount an NFS location with the ODBC driver, an odbc.ini file, and the XLSX data files while starting the Remote Data Agent container.
The following pics describe the local/NFS folder with the XLS ODBC driver and an odbc.ini file mounted to the Remote Data Agent container and access to the XLS Data file folder. The XLS ODBC driver can only read the XLSX file (no write).
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The following steps describe the XLSX ODBC data source configuration at the Remote Data Agent Server.
The XLSX ODBC driver is not included with the SAS Remote Data Agent deployment. You need to purchase and download the XLSX ODBC driver from a supported vendor. In a test environment, I have downloaded the CDATA XLSX ODBC driver to deploy and configure at the Remote Data Agent server. You can use any other vendor’s Unix-supported XLSX ODBC driver.
The CDATA XLSX ODBC driver (ExcelODBCDriverforUnix.deb)can be downloaded from the site: https://www.cdata.com/drivers/excel/odbc/
Before you can deploy the XLS ODBC driver, you may have to deploy a few pre-requisite packages on the operating system.
Code:
sudo apt-get install libc6 libstdc++6 zlib1g libgcc1 -y
It may also require the Unix driver manager application.
Code:
sudo apt-get install unixodbc unixodbc-dev -y
The CDATA XLSX ODBC driver can be installed using following statement.
Code:
sudo dpkg -i ~/ExcelODBCDriverforUnix.deb
By default, the CDATA XLSX ODBC driver will be installed under /opt/cdata/cdata-odbc-driver-for-excel folder. It will require applying the purchased/trial license by using the following statement.
Code:
/opt/cdata/cdata-odbc-driver-for-excel/bin/install-license.sh
The CDE connection to the XLSX file is an ODBC connection. An administrator must prepare an odbc.ini file with the path to the XLSX ODBC driver and the XLSX data file. The XLSX data file, odbc.ini, and XLSX ODBC driver path (mounted) must be reachable from the Remote Data Agent container. The folder and files containing the odbc.ini file should have read permission for user “sas” (uid 1001:1001). You may copy the sample odbc.ini and odbcinst.ini files from /etc folder to the SAS access-client subfolder and append/update an XLSX ODBC DSN entry. The following is an odbc.ini example; you can customize it before using it in your environment.
The value in XLSX URI contains a folder name containing a number of xlsx files with multiple sheets in it. Each tab from the xlsx file will be viewed as a separate table.
Code:
cp /etc/odbc* /viya-share/gelenv/config/access-clients/odbc
tee -a /viya-share/gelenv/config/access-clients/odbc/odbc.ini > /dev/null << EOF
[CDATA_XLS]
Driver=/opt/cdata/cdata-odbc-driver-for-excel/lib/libexcelodbc.x64.so
ConnectionType=Local
URI=/viya-share/gelenv/data
AuthScheme=None
EOF
You can test the XLSX ODBC configuration from the virtual machine (RDA server) where you have installed the driver. It requires initialization of ODBC-related system variables before you can test it. By using the ‘isql’ utility you can test and verify the ODBC configuration and access the XLSX data files.
Code:
export ODBCHOME=/opt/cdata/cdata-odbc-driver-for-excel
export ODBCINI=/viya-share/gelenv/config/access-clients/odbc/odbc.ini
export ODBCINST=/viya-share/gelenv/config/access-clients/odbc/odbcinst.ini
isql CDATA_XLS -v
help ;
select * from "class4.xlsx".class1 limit 5 ;
quit;
Log:
jumpuser@cargo-p11125-jump-vm:~$
jumpuser@arbor-p03072-jump-vm:~$isql CDATA_XLS -v
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> help
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CData | class1.xlsx | class1 | TABLE | Retrieve data from the "class1" sheet. |
| CData | class1.xlsx | class2 | TABLE | Retrieve data from the "class2" sheet. |
| CData | class1.xlsx | class3 | TABLE | Retrieve data from the "class3" sheet. |
| CData | class4.xlsx | class1 | TABLE | Retrieve data from the "class1" sheet. |
| CData | class4.xlsx | class2 | TABLE | Retrieve data from the "class2" sheet. |
| CData | class4.xlsx | class3 | TABLE | Retrieve data from the "class3" sheet. |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns -1
6 rows fetched
SQL>
SQL> select * from "class4.xlsx".class1 limit 5 ;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| RowId | Name | Sex | Age | Height | Weight |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | Alfred | M | 14 | 69 | 112.5 |
| 3 | Alice | F | 13 | 56.5 | 84 |
| 4 | Barbara | F | 13 | 65.3 | 98 |
| 5 | Carol | F | 14 | 62.8 | 102.5 |
| 6 | Henry | M | 14 | 63.5 | 102.5 |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
SQLRowCount returns -1
5 rows fetched
SQL> quit;
jumpuser@cargo-p11125-jump-vm:~$
Before you can use the XLSX ODBC driver and create a data source at RDA, you need to update the da-vars.env and sas-access.properties file with ODBC-related environment variables. Users can include and update the respective access engine environment variables. The following variable is related to the XLSX ODBC client.
Code:
da-vars.env:
## Update the LD_LIBRARY_PATH with the ODBC driver path.
LD_LIBRARY_PATH=/usr/lib64;/opt/cdata/cdata-odbc-driver-for-excel/lib
sas-access.properties :
## Update the ODBC variables with the ODBC client folder.
ODBCHOME=/opt/cdata/cdata-odbc-driver-for-excel
ODBCINI=/viya-share/gelenv/config/access-clients/odbc/odbc.ini
ODBCINST=/viya-share/gelenv/config/access-clients/odbc/odbcinst.ini
The odbc.ini folder must be owned by the user SAS for the Remote Data Agent services to access the ODBC driver and data files.
sudo chown 1001:1001 -R /viya-share/gelenv/config/access-clients/odbc;
ls -l /viya-share/gelenv/config/access-clients/odbc
Start the RDA service with the local data folder (e.g. /viya-share/gelenv, and /opt/cdata) mounted to the RDA container. The mounted folder contains the XLSX ODBC driver, XLSX data file, and odbc.ini file.
Code:
./container-manager start \
--license-path ${SASVIYA_LIC} \
--sasdata sasdata \
--vars da-vars.env \
--data sasdata/data \
--mount /viya-share/gelenv:/viya-share/gelenv,/opt/cdata:/opt/cdata \
--access-vars sas-access.properties \
sas-data-agent-server-remote
You may have to install the CDATA XLSX ODBC license inside the Remote Data Agent container. It depends on the ODBC vendor and how the license is distributed.
Code:
## to Log in to RDA container
docker exec -it sas-data-agent-server-remote bash
Code:
/opt/cdata/cdata-odbc-driver-for-excel/bin/install-license.sh
The SAS-Viya CLI is the user interface to manage and maintain the configurations at the SAS Cloud Data Exchange. An administrator must log in and authenticate a user for sas-viya CLI before creating data source connections.
Code:
source ~/sasviay_cli_env_vars.txt
export GELLOW_NAMESPACE=gelenv
export SAS_CLI_PROFILE=${GELLOW_NAMESPACE:-Default} ;
export SSL_CERT_FILE=~/.certs/${GELLOW_NAMESPACE}_trustedcerts.pem ;
sas-viya auth login --user $Uid --password $Pwd
sas-viya profile set-output text ;
sas-viya dagentsrv servers set-default --data-agent sas-data-agent-server-remote
The following statement creates an XLSX ODBC data source service with the ODBC DSN name defined in the odbc.ini
Code:
sas-viya dagentsrv data-services create odbc --name odbc_xls2 --driver odbc --dsn CDATA_XLS
sas-viya dagentsrv services list
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv data-services create odbc --name odbc_xls2 --driver odbc --dsn CDATA_XLS
The data service was successfully created.
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv services list
Name Type Domain Version Options
BASE base --- 2.6 ---
__SERVER__ server --- 2.6 PURGE_CACHE=30;CASE_SENSITIVITY=(OBJECT=F;COLUMN=F);CACHE=(NAME=AS;TIMEOUT=300)
odbc_csv2 odbc odbc_csv2 2.6 CONOPTS=(DRIVER=odbc;ODBC_DSN=CDATA_CSV);CASE_SENSITIVITY=(OBJECT=F;COLUMN=F)
odbc_xls2 odbc odbc_xls2 2.6 CONOPTS=(DRIVER=odbc;ODBC_DSN=CDATA_XLS);CASE_SENSITIVITY=(OBJECT=F;COLUMN=F)
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
The following statement tests the XLSX ODBC data source service created in the previous step.
Code:
sas-viya dagentsrv data-sources test --name odbc_xls2
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv data-sources test --name odbc_xls2
Successfully connected to data source odbc_xls2.
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
With a successful test connection to an XLSX ODBC data source, a user can list the catalog, schemas, tables, and data (each xlsx file with multiple tabs) from each tab using the sas-viya CLI.
Code:
sas-viya dagentsrv ds catalogs list --data-source odbc_xls2
sas-viya dagentsrv ds schemas list --catalog odbc_xls2 --data-source odbc_xls2
sas-viya dagentsrv ds tables list --catalog odbc_xls2 --schema 'class1.xlsx' --data-source odbc_xls2
sas-viya dagentsrv sql --sql "select * from \"class1.xlsx\".class3 limit 5 " --dsn odbc_xls2
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv ds catalogs list --data-source odbc_xls2
Name Driver Description
odbc_xls2 ODBC
jumpuser@arbor-p03072-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv ds schemas list --catalog odbc_xls2 --data-source odbc_xls2
Name
class1.xlsx
class4.xlsx
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv ds tables list --catalog odbc_xls2 --schema 'class1.xlsx' --data-source odbc_xls2
Name Type Native Catalog
class1 TABLE CData
class2 TABLE CData
class3 TABLE CData
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
jumpuser@arbor-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv sql --sql "select * from \"class1.xlsx\".class3 limit 5 " --dsn odbc_xls2
Age Height Name RowId Sex Weight
=== ====== ==== ===== === ======
14 69 Alfred 2 M 112.5
13 56.5 Alice 3 F 84
13 65.3 Barbara 4 F 98
14 62.8 Carol 5 F 102.5
14 63.5 Henry 6 M 102.5
jumpuser@arbor-p11125-jump-vm:~/CDERemoteDA_Deploy$
Note: The first row of the .xlsx tab appears as column names. If the first row is a data row, it will be considered as the column name.
With the XLSX ODBC data source configured at the Remote Data Agent, users can access on-premise XLSX data from the SAS Compute Server using the CDE LIBNAME statement. The following code describes access to an XLSX data file sheet via a Remote Data Agent data source. The code will display the first 10 rows from the class4.xlsx file and the sheet class3.
Code:
LIBNAME cdexls2 cde dataagentname="sas-data-agent-server-remote"
dsn=odbc_xls2 preserve_tab_names=yes;
Proc SQL outobs=10;
select * from cdexls2.class3(schema='class4.xlsx') ;
run;quit;
Log:
79
80 LIBNAME cdexls2 cde dataagentname="sas-data-agent-server-remote"
81 dsn=odbc_xls2 preserve_tab_names=yes;
NOTE: Libref CDEXLS2 was successfully assigned as follows:
Engine: CDE
Physical Name: odbc_xls2
83
84 Proc SQL outobs=10;
85 select * from cdexls2.class3(schema='class4.xlsx') ;
WARNING: Statement terminated early due to OUTOBS=10 option.
86 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 4.
With the XLSX ODBC data source configured at the Remote Data Agent, users can access on-premise XLSX data from the CAS using the CDE data connector. The following code describes the access to an XLSX data file and sheet via a Remote Data Agent data source using CDE CASLIB. The code will load CAS table from the class4.xlsx file and the sheet class3.
Code:
CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib cdexls2 datasource=(
srctype="clouddex",
dataAgentName="sas-data-agent-server-remote",
catalog="odbc_xls2",
conopts="dsn=odbc_xls2"
) libref=cdexls2;
proc casutil incaslib="cdexls2";
list files ;
run;quit;
/* CAS load from CDE RDA XLS ODBC table */
proc casutil incaslib="cdexls2" outcaslib="cdexls2";
load casdata="class3" casout="class_xls"
datasourceoptions=(schema="class4.xlsx" ) replace ;
list tables ;
run;quit;
cas mysession terminate;
Log:
83
84 caslib cdexls2 datasource=(
85 srctype="clouddex",
86 dataAgentName="sas-data-agent-server-remote",
87 catalog="odbc_xls2",
88 conopts="dsn=odbc_xls2"
89 ) libref=cdexls2;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'CDEXLS2' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'CDEXLS2'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: real time 0.011001 seconds
94
95 /* CAS load from CDE RDA XLS ODBC table */
96 proc casutil incaslib="cdexls2" outcaslib="cdexls2";
NOTE: The UUID '5575f964-b5c2-9a49-9d76-f339c75e0320' is connected using session MYSESSION.
97 load casdata="class3" casout="class_xls"
98 datasourceoptions=(schema="class4.xlsx" ) replace ;
NOTE: Executing action 'table.loadTable'.
NOTE: Connecting using the OAuth token for CAS user 'geldmui@gelenable.sas.com'.
WARNING: The specified domain, "odbc_xls2", is unknown.
WARNING: Using server default session timeout of 3600
WARNING: Function completed successfully, possibly with a non-fatal error (warning).
WARNING: Function completed successfully, possibly with a non-fatal error (warning).
NOTE: Cloud Analytic Services made the external data from class3 available as table CLASS_XLS in caslib cdexls2.
NOTE: Action 'table.loadTable' used (Total process time):
Many thanks to Brian Hess for sharing CDE expertise and help on this post.
Important Links:
Remote SAS Data Agent: Deployment and Administration Guide
Communities Posts:
Cloud Data Exchange for the SAS Viya Platform
SAS Viya Cloud Data Exchange Deployment and Configuration (Part -1)
SAS Viya Cloud Data Exchange Deployment and Configuration (Part -2)
Configuring BASE SAS Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring Oracle Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring SAS Federated Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring DB2 Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring MS-SQL Server Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring CSV ODBC Data Source at Remote Data Agent (Cloud Data Exchange)
Find more articles from SAS Global Enablement and Learning here.
You seem to be only showing using this "XLS ODBC" tool to access the newer XLSX file format files. Does it also support the older XLS file format (as its name implies)?
@Tom , It's dependent on third party ODBC driver support. If the ODBC driver can read an older version of xls file format, the RDA will work with it. I have latest CDATA trial version of Unix ODBC driver in environment, which supported xlsx file format.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.