BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
curaloco
Fluorite | Level 6

We are setting up a new SAS FAW environment that is connecting to Snowflake (ODBC) and S3 as our data sources.

 

Opening a Snowflake table in SAS Enterprise Guide 7.15 takes a really long time (5-16 hours) for medium sized tables, Character variable length in Snowflake seems to be one of the reasons, being this: VARCHAR(16777216) the default length for character variables in Snowflake

We have tried creating a SAS view, which solves the speed problem, but it demands a manually intensive process to determine the right length of each character variable and it only works after the Snowflake table has been completely scanned which could take a long time, or simply does not finish.

Are there any other approaches or configuration changes to the ODBC parameters that could help? 

Is somebody else facing the same issues when connecting to Snowflake using ODBC?

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
JBailey
Barite | Level 11

Hi @curaloco,

 

TL;DR - best practice is to set the lengths when creating variable length character columns.

 

The problem you are experiencing is likely caused by not putting a length on VARCHAR column definition. The EG issue is separate but exacerbated by the text column lengths expanding. It is not a good idea to open an entire table in an EG viewer because it is going to read the entire thing into SAS. At a minimum it is best to limit the number of rows displayed in EG (Tools --> Options --> Query --> Number of rows to process in preview results window).

 

It is important to understand that these issues are not unique to Snowflake it can happen with many DBMSs.

 

Here is an example showing the character length issue:

 

libname snowdmax odbc dsn='snowflake_DSN' schema=PUBLIC  
                      user=myuser pw=passwd123
                      dbcommit=0 autocommit=no 
                      readbuff=100 insertbuff=100
                      dbmax_text=10;

/* This code can be used to spit out the ODBC driver options */ libname prompt odbc prompt=yes; %put %superq(sysdbmsg); /* notice I am not setting the lengths here */ data snow.cars_dbtype (dbtype=(make='string', model='varchar', type='varchar',origin='text')); set sashelp.cars; run;

Let's see what the columns look like.

 

columns_32k.jpg

 

Notice: 32K columns. This is not good. 

 

Let's create a SAS data set from the Snowflake table and see if DBMAX_TEXT= helps.

 

data work.snow_cars;
   set snow.cars_dbtype;
run;

/* Lets compare it to one with length information */

data snow.good_cars;
set sashelp.cars;
run;

data work.good_cars;
set snow.good_cars;
run;

 

Here is the result... Notice the size difference... DBMAX_TEXT= didn't help. It only works on BLOBs. I think this is a problem and am going to try to do something about it. Some ODBC drivers will let you limit the length of text strings. Unfortunately, the Snowflake ODBC driver doesn't appear to be one of these. I will ask them about it.

 

Back to the size...

 

snow_32k.jpg

It's a huge difference. This problem will cause performance issues when reading and writing. It also causes one of the sneakiest SAS issues I have ever seen. It happened with Hadoop but this is the same issue.

 

A call came into SAS Tech Support concerned about SASWORK running out of space. It took a while to discover that expanding character column lengths was causing the problem. Initially, the customer added space to SASWORK. It didn't address the problem although it did postpone it for a while.

 

So, the best practice is to set the lengths when creating variable length character columns. 

View solution in original post

10 REPLIES 10
LinusH
Tourmaline | Level 20
The post is created now @JBailey.
Data never sleeps
JBailey
Barite | Level 11

Hi @curaloco,

 

TL;DR - best practice is to set the lengths when creating variable length character columns.

 

The problem you are experiencing is likely caused by not putting a length on VARCHAR column definition. The EG issue is separate but exacerbated by the text column lengths expanding. It is not a good idea to open an entire table in an EG viewer because it is going to read the entire thing into SAS. At a minimum it is best to limit the number of rows displayed in EG (Tools --> Options --> Query --> Number of rows to process in preview results window).

 

It is important to understand that these issues are not unique to Snowflake it can happen with many DBMSs.

 

Here is an example showing the character length issue:

 

libname snowdmax odbc dsn='snowflake_DSN' schema=PUBLIC  
                      user=myuser pw=passwd123
                      dbcommit=0 autocommit=no 
                      readbuff=100 insertbuff=100
                      dbmax_text=10;

/* This code can be used to spit out the ODBC driver options */ libname prompt odbc prompt=yes; %put %superq(sysdbmsg); /* notice I am not setting the lengths here */ data snow.cars_dbtype (dbtype=(make='string', model='varchar', type='varchar',origin='text')); set sashelp.cars; run;

Let's see what the columns look like.

 

columns_32k.jpg

 

Notice: 32K columns. This is not good. 

 

Let's create a SAS data set from the Snowflake table and see if DBMAX_TEXT= helps.

 

data work.snow_cars;
   set snow.cars_dbtype;
run;

/* Lets compare it to one with length information */

data snow.good_cars;
set sashelp.cars;
run;

data work.good_cars;
set snow.good_cars;
run;

 

Here is the result... Notice the size difference... DBMAX_TEXT= didn't help. It only works on BLOBs. I think this is a problem and am going to try to do something about it. Some ODBC drivers will let you limit the length of text strings. Unfortunately, the Snowflake ODBC driver doesn't appear to be one of these. I will ask them about it.

 

Back to the size...

 

snow_32k.jpg

It's a huge difference. This problem will cause performance issues when reading and writing. It also causes one of the sneakiest SAS issues I have ever seen. It happened with Hadoop but this is the same issue.

 

A call came into SAS Tech Support concerned about SASWORK running out of space. It took a while to discover that expanding character column lengths was causing the problem. Initially, the customer added space to SASWORK. It didn't address the problem although it did postpone it for a while.

 

So, the best practice is to set the lengths when creating variable length character columns. 

curaloco
Fluorite | Level 6

Than you for the response. As you point out the current Snowflake ODBC connection available is susceptible to undefined VARCHAR lengths, SAS does not like them and it blows the available memory. Best way to deal with the VARCHAR lengths is altering them in Snowflake directly. I have to test your recommendation for limiting the number of rows displayed inside EG, just a heads up that, when trying to limit the number of rows, if you try to limit them using PROC SQL outobs=xxx; or DATA;SET  (obs=xxx) , it does not work. The whole table will get loaded into SAS memory and then it will show you the number of rows you requested, defeating the purpose of just loading a subset of the data.

 

Until SAS Connect is developed to replace the ODBC connection to Snowflake, I am using SQL pass thru for Snowflake, it works faster as it sends all the processing to Snowflake and just grabs the results back for SAS to display.

 

 

 

JBailey
Barite | Level 11

Hi @curaloco,

 

I am happy my post solved your problem.

 

The VARCHAR length issue is a problem with all SAS/ACCESS engines. Having a dedicated Snowflake engine will not magically fix it.

 

Best wishes,

Jeff

LinusH
Tourmaline | Level 20

But perhaps a VARCHAR data type in SAS would help the situation...

...which is available in Viya. Would be interesting to see if there are some lessons learned from a Viya - Snowflake (or any other DBMS with unspecified string lengths).

Data never sleeps
curaloco
Fluorite | Level 6

Thank you for that info Jeff, I wrongly assumed that the VARCHAR issue would be solved by SAS Connect replacing ODBC, I'll update my internal SAS request regarding the development of SAS Connect for Snowflake.

sai212
Fluorite | Level 6

@JBailey  Thanks for your inputs! How do we set these parameters while using Passthrough? With Greenplum, I was able to set lengths of the variables using VARCHAR option but this doesn't seem to work for Snowflake.

 

Something like this, FIRST_NAME :: VARCHAR(50) AS FIRST_NAME

Appreciate your help!

SASKiwi
PROC Star

@sai212  - Please don't post new problems in existing posts. Create a new post and add a link back to this one. In any case you should post your complete SAS log and not just bits of code.

sai212
Fluorite | Level 6
Sorry, these are my first posts and I wasn't aware.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 10869 views
  • 6 likes
  • 5 in conversation