05-11-2018 08:02 PM
We are setting up a new SAS FAW environment that is connecting to Snowflake (ODBC) and S3 as our data sources.
Opening a Snowflake table in SAS Enterprise Guide 7.15 takes a really long time (5-16 hours) for medium sized tables, Character variable length in Snowflake seems to be one of the reasons, being this: VARCHAR(16777216) the default length for character variables in Snowflake
We have tried creating a SAS view, which solves the speed problem, but it demands a manually intensive process to determine the right length of each character variable and it only works after the Snowflake table has been completely scanned which could take a long time, or simply does not finish.
Are there any other approaches or configuration changes to the ODBC parameters that could help?
Is somebody else facing the same issues when connecting to Snowflake using ODBC?
a month ago - last edited a month ago
TL;DR - best practice is to set the lengths when creating variable length character columns.
The problem you are experiencing is likely caused by not putting a length on VARCHAR column definition. The EG issue is separate but exacerbated by the text column lengths expanding. It is not a good idea to open an entire table in an EG viewer because it is going to read the entire thing into SAS. At a minimum it is best to limit the number of rows displayed in EG (Tools --> Options --> Query --> Number of rows to process in preview results window).
It is important to understand that these issues are not unique to Snowflake it can happen with many DBMSs.
Here is an example showing the character length issue:
libname snowdmax odbc dsn='snowflake_DSN' schema=PUBLIC user=myuser pw=passwd123 dbcommit=0 autocommit=no readbuff=100 insertbuff=100 dbmax_text=10;
/* This code can be used to spit out the ODBC driver options */ libname prompt odbc prompt=yes; %put %superq(sysdbmsg); /* notice I am not setting the lengths here */ data snow.cars_dbtype (dbtype=(make='string', model='varchar', type='varchar',origin='text')); set sashelp.cars; run;
Let's see what the columns look like.
Notice: 32K columns. This is not good.
Let's create a SAS data set from the Snowflake table and see if DBMAX_TEXT= helps.
data work.snow_cars; set snow.cars_dbtype; run;
/* Lets compare it to one with length information */
Here is the result... Notice the size difference... DBMAX_TEXT= didn't help. It only works on BLOBs. I think this is a problem and am going to try to do something about it. Some ODBC drivers will let you limit the length of text strings. Unfortunately, the Snowflake ODBC driver doesn't appear to be one of these. I will ask them about it.
Back to the size...
It's a huge difference. This problem will cause performance issues when reading and writing. It also causes one of the sneakiest SAS issues I have ever seen. It happened with Hadoop but this is the same issue.
A call came into SAS Tech Support concerned about SASWORK running out of space. It took a while to discover that expanding character column lengths was causing the problem. Initially, the customer added space to SASWORK. It didn't address the problem although it did postpone it for a while.
So, the best practice is to set the lengths when creating variable length character columns.