Often, I have seen a question from the field, how to access CSV, XLS data files from an on-premise data center to a cloud-hosted SAS Viya environment without making a copy to cloud storage?
The Remote Data Agent can access the on-premise CSV, XLS data files using an ODBC driver connection. With the SAS Viya Cloud Data Exchange (CDE), the CDE administrator can deploy and configure the Remote Data Agent with an ODBC driver on an on-premise server to access the CSV, XLS data files.
In this post, I discuss the deployment and configuration of the CSV ODBC driver data source at the Remote Data agent.
The CSV ODBC driver is not included with the SAS Remote Data Agent deployment. You need to purchase and download the CSV ODBC driver from a supported vendor. The CDE administrator needs to deploy the CSV ODBC driver and prepare the odbc.ini with ODBC connection details. The ODBC driver and ODBC configuration file must be placed at a location mounted to the Remote Data Agent container. A SAS administrator can configure the ODBC data source at the Remote Data Agent server.
The Remote Data Agent is a containerized software that runs with docker runtime. The CDE administrator can mount an NFS location with the ODBC driver and an odbc.ini file while starting the Remote Data Agent container. The CSV ODBC driver can only read the CSV file (no write).
The following pics describe the local/NFS folder with the CSV ODBC driver and an odbc.ini file mounted to the Remote Data Agent container and access to the CSV Data file folder.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The following steps describe the CSV ODBC data source configuration at the Remote Data Agent Server.
The CSV ODBC driver is not included with the SAS Remote Data Agent deployment. You need to purchase and download the CSV ODBC driver from a supported vendor. In a test environment, I have downloaded the CDATA CSV ODBC driver to deploy and configure at the Remote Data Agent server. You can use any other vendor’s CSV ODBC driver.
The CDATA CSV ODBC driver (CSVODBCDriverforUnix.deb)can be downloaded from the site: https://www.cdata.com/drivers/csv/odbc/
Before you can deploy the CSV ODBC driver, you may have to deploy a few dependencies packages at operating system.
Code:
sudo apt-get install libc6 libstdc++6 zlib1g libgcc1
It may also require the Unix driver manager application.
Code:
sudo apt-get install unixodbc unixodbc-dev
The CDATA CSV ODBC driver can be installed using following statement.
Code:
sudo dpkg -i ~/CSVODBCDriverforUnix.deb
By default, the CDATA CSV ODBC driver will installed under /opt/cdata/cdata-odbc-driver-for-csv folder. It will require applying the purchased/trial license by using following statement.
Code:
/opt/cdata/cdata-odbc-driver-for-csv/bin/install-license.sh
The CDE connection to the CSV file is an ODBC connection. An administrator must prepare an odbc.ini file with the path to the CSV ODBC driver and the CSV data file. The CSV data file, odbc.ini, and CSV ODBC driver path (mounted) must be reachable from the Remote Data Agent container. The folder and files containing the odbc.ini file should have read permission for user “sas” (uid 1001:1001). You may copy the sample odbc.ini and odbcinst.ini files from /etc folder to the SAS access-client subfolder and append/update a CSV ODBC DSN entry. The following is an odbc.ini example; you can customize it before using it in your environment.The URI= variable represents the location for the CSV data files.
Code:
cp /etc/odbc* /viya-share/gelenv/config/access-clients/odbc
tee -a /viya-share/gelenv/config/access-clients/odbc/odbc.ini > /dev/null << EOF
[CDATA_CSV]
Driver=/opt/cdata/cdata-odbc-driver-for-csv/lib/libcsvodbc.x64.so
ConnectionType=Local
URI=file:///viya-share/gelenv/data
AuthScheme=None
EOF
sudo chown 1001:1001 -R /viya-share/gelenv/config/access-clients/odbc;
You can test the CSV ODBC configuration from the virtual machine (RDA server) where you have installed the driver. It requires initialization of ODBC-related system variables before you can test it. By using the ‘isql’ utility you can test and verify the ODBC configuration and access the CSV data files.
Code:
export ODBCHOME=/opt/cdata/cdata-odbc-driver-for-csv
export ODBCINI=/viya-share/gelenv/config/access-clients/odbc/odbc.ini
export ODBCINST=/viya-share/gelenv/config/access-clients/odbc/odbcinst.ini
isql CDATA_CSV -v
help ;
select * from "class.csv" limit 5 ;
quit;
Log:
jumpuser@cargo-p11125-jump-vm:~$isql CDATA_CSV -v
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
SQL> help ;
+----------------------------------------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |
+------------------------------------------------------------------------------------+
| CData | CSV | cars_source.csv | TABLE | |
| CData | CSV | class.csv | TABLE | |
| CData | CSV | flight_reporting.csv | TABLE | |
+------------------------------------------------------------------------------------+
SQLRowCount returns -1
3 rows fetched
SQL>
SQL> select * from "class.csv" limit 5 ;
+-----------------+------------+-------------------------+-------------------------+
| Name | Sex | Age | Height | Weight |
+----------------------------------------------------------------------------------------------------------------+------------+-------------------------+-------------------------+
| Alfred | M | 14 | 69 | 112.5 |
| Alice | F | 13 | 56.5 | 84 |
| Barbara | F | 13 | 65.3 | 98 |
| Carol | F | 14 | 62.8 | 102.5 |
| Henry | M | 14 | 63.5 | 102.5 |
+----------------+------------+-------------------------+-------------------------+
SQLRowCount returns -1
5 rows fetched
SQL> quit
jumpuser@cargo-p11125-jump-vm:~$
Before you can use the CSV ODBC driver and create a data source at RDA, you need to update the da-vars.env and sas-access.properties file with ODBC-related environment variables. Users can include and update the respective access engine environment variables. The following variable is related to the CSV ODBC client.
Code:
da-vars.env:
## Update the LD_LIBRARY_PATH with the ODBC driver path.
LD_LIBRARY_PATH=/usr/lib64;/opt/cdata/cdata-odbc-driver-for-csv/lib
sas-access.properties :
## Update the ODBC variables with the ODBC client folder.
ODBCHOME=/opt/cdata/cdata-odbc-driver-for-csv
ODBCINI=/viya-share/gelenv/config/access-clients/odbc/odbc.ini
ODBCINST=/viya-share/gelenv/config/access-clients/odbc/odbcinst.ini
Start the RDA service with the local data folder (e.g. /viya-share/gelenv, and /opt/cdata) mounted to the RDA container. The mounted folder contains the CSV ODBC driver, SCV data files, and odbc.ini file.
Code:
./container-manager start \
--license-path ${SASVIYA_LIC} \
--sasdata sasdata \
--vars da-vars.env \
--data sasdata/data \
--mount /viya-share/gelenv:/viya-share/gelenv,/opt/cdata:/opt/cdata \
--access-vars sas-access.properties \
sas-data-agent-server-remote
You may have to install the CDATA CSV ODBC license inside the Remote Data Agent container. It depends on the ODBC vendor and how the license is distributed.
Code:
## to Log in to RDA container
docker exec -it sas-data-agent-server-remote bash
Code:
/opt/cdata/cdata-odbc-driver-for-csv/bin/install-license.sh
The SAS-Viya CLI is the user interface to manage and maintain the configurations at the SAS Cloud Data Exchange. An administrator must log in and authenticate a user for sas-viya CLI before creating data source connections.
Code:
source ~/sasviay_cli_env_vars.txt
export GELLOW_NAMESPACE=gelenv
export SAS_CLI_PROFILE=${GELLOW_NAMESPACE:-Default} ;
export SSL_CERT_FILE=~/.certs/${GELLOW_NAMESPACE}_trustedcerts.pem ;
sas-viya auth login --user $Uid --password $Pwd
sas-viya profile set-output text ;
sas-viya dagentsrv servers set-default --data-agent sas-data-agent-server-remote
The following statement creates a CSV ODBC data source service with the ODBC DSN name defined in the odbc.ini.
Code:
sas-viya dagentsrv data-services create odbc --name odbc_csv2 --driver odbc --dsn CDATA_CSV
sas-viya dagentsrv services list
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv data-services create odbc --name odbc_csv2 --driver odbc --dsn CDATA_CSV
The data service was successfully created.
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv services list
Name Type Domain Version Options
BASE base --- 2.6 ---
__SERVER__ server --- 2.6 PURGE_CACHE=30;CASE_SENSITIVITY=(OBJECT=F;COLUMN=F);CACHE=(NAME=AS;TIMEOUT=300)
odbc_csv2 odbc odbc_csv2 2.6 CONOPTS=(DRIVER=odbc;ODBC_DSN=CDATA_CSV);CASE_SENSITIVITY=(OBJECT=F;COLUMN=F)
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
The following statement tests the CSV ODBC data source service created in the previous step.
Code:
sas-viya dagentsrv data-sources test --name odbc_csv2
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv data-sources test --name odbc_csv2
Successfully connected to data source odbc_csv2.
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
With a successful test connection to a CSV ODBC data source, a user can list the data table and data from the CSV data files using the sas-viya CLI.
Code:
sas-viya dagentsrv ds tables list --catalog odbc_csv2 --schema CSV --data-source odbc_csv2
sas-viya dagentsrv sql --sql "select * from \"class.CSV\" limit 5 " --dsn odbc_csv2
Log:
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv ds tables list --catalog odbc_csv2 --schema CSV --data-source odbc_csv2
Name Type Native Catalog
cars_source.csv TABLE CData
class.csv TABLE CData
flight_reporting.csv TABLE CData
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$sas-viya dagentsrv sql --sql "select * from \"class.CSV\" limit 5 " --dsn odbc_csv2
Age Height Name Sex Weight
=== ====== ==== === ======
14 69 Alfred M 112.5
13 56.5 Alice F 84
13 65.3 Barbara F 98
14 62.8 Carol F 102.5
14 63.5 Henry M 102.5
jumpuser@cargo-p11125-jump-vm:~/CDERemoteDA_Deploy$
Note: The first row of the .csv file appears as column names. If the first row is a data row, it will show up in the column name.
With the CSV ODBC data source configured at the Remote Data Agent, users can access on-premise CSV data files from the SAS Compute Server using the CDE LIBNAME statement. The following code describes the access to a CSV data file via a Remote Data Agent data source.
Code:
LIBNAME cdecsv2 cde dataagentname="sas-data-agent-server-remote"
dsn=odbc_csv2 preserve_tab_names=yes;
Proc SQL outobs=10;
select * from cdecsv2.'class.csv'n ;
run;quit;
Log:
80
81 LIBNAME cdecsv2 cde dataagentname="sas-data-agent-server-remote"
82 dsn=odbc_csv2 preserve_tab_names=yes;
NOTE: Libref CDECSV2 was successfully assigned as follows:
Engine: CDE
Physical Name: odbc_csv2
83
84
85 Proc SQL outobs=10;
86 select * from cdecsv2.'class.csv'n ;
WARNING: Statement terminated early due to OUTOBS=10 option.
87 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 4.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.14 seconds
cpu time 0.04 seconds
88
With the CSV ODBC data source configured at the Remote Data Agent, users can access on-premise CSV data files from the CAS using the CDE data connector. The following code describes the access to a CSV data file via a Remote Data Agent data source using CDE CASLIB.
Code:
CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib cdecsv2 datasource=(
srctype="clouddex",
dataAgentName="sas-data-agent-server-remote",
catalog="odbc_csv2",
schema="CSV",
conopts="dsn=odbc_csv2"
) libref=cdecsv2;
proc casutil incaslib="cdecsv2";
list files ;
run;
quit;
/* CAS load from CDE RDA CSV ODBC table */
proc casutil incaslib="cdecsv2" outcaslib="cdecsv2";
load casdata="class.csv" casout="class_csv" replace ;
list tables ;
run;
quit;
cas mysession terminate;
Log:
82
83 caslib cdecsv2 datasource=(
84 srctype="clouddex",
85 dataAgentName="sas-data-agent-server-remote",
86 schema="CSV",
87 catalog="odbc_csv2",
88 conopts="dsn=odbc_csv2"
89 ) libref=cdecsv2;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'CDECSV2' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'CDECSV2'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: real time 0.011860 seconds
NOTE: cpu time 0.023530 seconds (198.40%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 1.72M (0.00%)
NOTE: CASLIB CDECSV2 for session MYSESSION will be mapped to SAS Library CDECSV2.
NOTE: Action to ADD caslib CDECSV2 completed for session MYSESSION.
90
95
96 /* CAS load from CDE RDA CSV ODBC table */
97 proc casutil incaslib="cdecsv2" outcaslib="cdecsv2";
NOTE: The UUID 'ff28b734-a944-cc46-ac77-981e0f1951eb' is connected using session MYSESSION.
98 load casdata="class.csv" casout="class_csv" replace ;
NOTE: Executing action 'table.loadTable'.
NOTE: Connecting using the OAuth token for CAS user 'geldmui@gelenable.sas.com'.
WARNING: The specified domain, "odbc_csv2", is unknown.
WARNING: Using server default session timeout of 3600
NOTE: Cloud Analytic Services made the external data from class.csv available as table CLASS_CSV in caslib cdecsv2.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE: real time 0.729696 seconds
Many thanks to Brian Hess for sharing the expert knowledge and help on this post.
Important Links:
Remote SAS Data Agent: Deployment and Administration Guide
Posts:
Cloud Data Exchange for the SAS Viya Platform
SAS Viya Cloud Data Exchange Deployment and Configuration (Part -1)
SAS Viya Cloud Data Exchange Deployment and Configuration (Part -2)
Configuring BASE SAS Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring Oracle Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring SAS Federated Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring DB2 Data Source at Remote Data Agent (Cloud Data Exchange)
Configuring MS-SQL Server Data Source at Remote Data Agent (Cloud Data Exchange)
Find more articles from SAS Global Enablement and Learning here.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.