Hello, we have been experimenting with uploading results generated with SAS to a Databricks SQL warehouse. The code looks as follows: libname dbx odbc prompt="Driver={Simba Spark ODBC Driver};
Host=foo.cloud.databricks.com;
Port=443;
HTTPPath=/sql/1.0/warehouses/bar;
SSL=1;
ThriftTransport=2;
AuthMech=3;
UID=token;
PWD=baz;
Catalog=foofoo;
Schema=barbar;
DefaultStringColumnLength=32767"
dbcommit=10000
insertbuff=10000
readbuff=1000
dbcreate_table_opts="TBLPROPERTIES('delta.columnMapping.mode' = 'name', 'delta.checkpoint.writeStatsAsStruct' = 'false', 'delta.autoOptimize.optimizeWrite' = 'true')"
preserve_col_names=yes;
proc sql;
create table dbx.test as
select * from test;
run; The performance leaves a lot to be desired unfortunately. As an example, a dataset with 122 variables and 127 thousand observations takes between five and six minutes to upload, whereas it takes less than 20 seconds to upload to a Microsoft SQL Server DB (also via ODBC). Is this expected? According to the documentation, Databricks can also be accessed via JDBC and Spark SAS ACCESS modules, but we unfortunately do not have them licensed. Are there any options which could improve ODBC upload performance? I tried increasing insertbuff and dbcommit further, but I get the following error when going above 12k: ERROR: CLI execute error: [Simba][Hardy] (130) An error occurred while an INSERT statement which causes the driver to reconnect to the server. Thanks for your help in advance!
... View more