BookmarkSubscribeRSS Feed
js5
Pyrite | Level 9 js5
Pyrite | Level 9

Hello,

 

we have been experimenting with uploading results generated with SAS to a Databricks SQL warehouse. The code looks as follows:

 

libname dbx odbc prompt="Driver={Simba Spark ODBC Driver};
Host=foo.cloud.databricks.com;
Port=443;
HTTPPath=/sql/1.0/warehouses/bar;
SSL=1;
ThriftTransport=2;
AuthMech=3;
UID=token;
PWD=baz;
Catalog=foofoo;
Schema=barbar;
DefaultStringColumnLength=32767"
dbcommit=10000
insertbuff=10000
readbuff=1000
dbcreate_table_opts="TBLPROPERTIES('delta.columnMapping.mode' = 'name', 'delta.checkpoint.writeStatsAsStruct' = 'false', 'delta.autoOptimize.optimizeWrite' = 'true')"
preserve_col_names=yes;

proc sql;
create table dbx.test as
	select * from test;
run;

The performance leaves a lot to be desired unfortunately. As an example, a dataset with 122 variables and 127 thousand observations takes between five and six minutes to upload, whereas it takes less than 20 seconds to upload to a Microsoft SQL Server DB (also via ODBC).

 

Is this expected? According to the documentation, Databricks can also be accessed via JDBC and Spark SAS ACCESS modules, but we unfortunately do not have them licensed. Are there any options which could improve ODBC upload performance? I tried increasing insertbuff and dbcommit further, but I get the following error when going above 12k:

ERROR: CLI execute error: [Simba][Hardy] (130) An error occurred while an INSERT statement which causes the driver to reconnect to the server.

Thanks for your help in advance!

7 REPLIES 7
LinusH
Tourmaline | Level 20

There are a lot of parameters that can affect performance in such use case.

If few questions:

- Where does your comparison SQL Server reside (vs Databricks)?

- Have you tried to load this data by other means into Databricks? Locally (to rule out actual load problems), remotely (compare with SAS).

Some hints might be given if you active tracing:

options sastrace=',,,d' sastraceloc=saslog nostsuffix msglevel=i;

 

Data never sleeps
js5
Pyrite | Level 9 js5
Pyrite | Level 9

The architecture is somewhat complex due to historical reasons:

  • the library I am copying from is locaten on an on-prem network share
  • SAS runs on AWS EC2
  • SQL Server runs on an on-prem VM
  • Databricks runs on AWS

Creating the table directly in databricks (CTAS from the SQL Server table connected as catalog) takes two minutes. So still much longer than creating a SQL Server table, but nevertheless two to three times faster than creating it via ODBC.

Ksharp
Super User

Did you try to use PROC COPY instead of PROC SQL?

proc sql;
create table dbx.test as
	select * from test;
run;
------------>
proc copy in=work out=dbx;
select test;
run;

 

js5
Pyrite | Level 9 js5
Pyrite | Level 9
There is no meaningful difference when using proc datasets. This is likely explained by the fact that it still uses SQL in the background according to the ODBC trace.
SASKiwi
PROC Star

What is the download performance for the same table? If both are taking a similar amount of time then I suggest that it is your network connection to Databricks that is the problem.

js5
Pyrite | Level 9 js5
Pyrite | Level 9

Download performance is not an issue. If it does not fail due to insufficient memory (I need to investigate why this happens), the entire table is downloaded in around 10 seconds.

SASKiwi
PROC Star

In my experience tuning INSERTBUFF works for SQL Server but it looks like to be not the case with Databricks. Have you talked to the Databricks DBAs to see if they have any ideas about the slowness? It might be worth trying bulk loading to see if that helps.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 268 views
  • 0 likes
  • 4 in conversation