The Challenge
When Snowflake VARCHAR columns lack defined lengths, SAS assigns them its maximum character size (32,767). This can lead to excessive memory usage, long run times, or even out-of-memory errors.
The Solution
LIBNAME or Data Set option SCANSTRINGCOLUMNS= YES scans all VARCHAR columns in a Snowflake table to determine the actual maximum length of the columns before loading the data into SAS. That means a 2 digit STATECODE (e.g. NY) with an undefined length in Snowflake will load as 2 characters instead of 32,767 characters. This option can significantly reduce the size of the resulting table and can accelerate the loading process.
An Example
I created a copy of SASHELP.ZIPCODE in Snowflake and changed all the character columns to unspecified. SASHELP.ZIPCODE has 40938 rows, 21 columns (13 of which are character type), and has a file size = 35 MB.
I then copied the Snowflake table back to SAS without and with the SCANSTRINGCOLUMNS=YES option:
Code to test on your environment
Use the following code to create the data sets and replicate the example described above. I used SAS/ACCESS® interface to Snowflake and wrote my code in SAS Studio on Viya 4.0.
Step 1: Add system options to enhance performance statistics in the log.
OPTION
SASTRACE=',,,ds'
SASTRACELOC=SASLOG
NOSTSUFFIX
SQL_IP_TRACE=(note, source)
msglevel=i
FULLSTIMER;
Step 2: Create a copy of SASHELP.ZIPCODE in Snowflake with undefined varchar lengths.
Data SNOW.ZIP_UNDEFINED (KEEP= ZIP X Y ZIP_CLASS_U CITY_U STATE STATECODE_U
STATENAME_U COUNTY COUNTYNM_U MSA AREACODE AREACODES_U TIMEZONE_U GMTOFFSET
DST_U PONAME_U ALIAS_CITY_U ALIAS_CITYN_U CITY2_U STATENAME2_U);
length
ZIP_CLASS_U varchar(*)
CITY_U varchar(*)
STATECODE_U varchar(*)
STATENAME_U varchar(*)
COUNTYNM_U varchar(*)
AREACODES_U varchar(*)
TIMEZONE_U varchar(*)
DST_U varchar(*)
PONAME_U varchar(*)
ALIAS_CITY_U varchar(*)
ALIAS_CITYN_U varchar(*)
CITY2_U varchar(*)
STATENAME2_U varchar(*)
;
Set SASHELP.ZIPCODE;
ZIP_CLASS_U = ZIP_CLASS;
CITY_U = CITY;
STATECODE_U = STATECODE;
STATENAME_U = STATENAME;
COUNTYNM_U = COUNTYNM;
AREACODES_U = AREACODES;
TIMEZONE_U = TIMEZONE;
DST_U = DST;
PONAME_U = PONAME;
ALIAS_CITY_U = ALIAS_CITY;
ALIAS_CITYN_U = ALIAS_CITYN;
CITY2_U = CITY2;
STATENAME2_U = STATENAME2;
Run;
Step 3: Copy created Snowflake table back to SAS without SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size.
data work.sf_zip_1;
set SNOW.ZIP_UNDEFINED;
run;
/*log results - real time 22.25 seconds with bulkunload*/
PROC CONTENTS data=work.sf_zip_1;
/* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 16 GB file size. All Char length variables = 32767*/
Step 4: Copy created Snowflake table back to SAS with SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size.
data work.sf_zip_2;
set SNOW.ZIP_UNDEFINED
(SCANSTRINGCOLUMNS=YES);
run;
/*log results - real time 4.36 seconds with bulkunload*/
PROC CONTENTS data=work.sf_zip_2;
/* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 33 MB file size. Maximum Char length variable is 300 (ALIAS_CITYN_U) followed by ALIAS_CITY_U (284).*/
Resources
Since this option doesn't work in the SAS 9.4 Snowflake connector I instead use this to trim VARCHAR columns to a reasonable size:
conopts = 'default_varchar_size=256'
default_varchar_size is a Snowflake option you can set from SAS.
Also I have the COMPRESS SAS option on to ensure Snowflake tables read into SAS don't blow out in size.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.