BookmarkSubscribeRSS Feed

Preventing Data Explosion When Reading Snowflake Data into SAS Viya 4.0 Using SCANSTRINGCOLUMNS=

Started 2 weeks ago by
Modified 2 weeks ago by
Views 218

The Challenge

 

When Snowflake VARCHAR columns lack defined lengths, SAS assigns them its maximum character size (32,767). This can lead to excessive memory usage, long run times, or even out-of-memory errors.

 

The Solution

 

LIBNAME or Data Set option SCANSTRINGCOLUMNS= YES scans all VARCHAR columns in a Snowflake table to determine the actual maximum length of the columns before loading the data into SAS.  That means a 2 digit STATECODE (e.g. NY) with an undefined length in Snowflake will load as 2 characters instead of 32,767 characters.  This option can significantly reduce the size of the resulting table and can accelerate the loading process.

 

An Example

 

I created a copy of SASHELP.ZIPCODE in Snowflake and changed all the character columns to unspecified.  SASHELP.ZIPCODE has 40938 rows, 21 columns (13 of which are character type), and has a file size = 35 MB.

 

I then copied the Snowflake table back to SAS without and with the SCANSTRINGCOLUMNS=YES option: 

  • Without SCANSTRINGCOLUMNS=YES option, the read to SAS took 22 seconds and the file size grew to 16 GB with 13 character columns with length = 32,767.  That's a 471% increase in size!
  • With SCANSTRINGCOLUMNS=YES option, the read to SAS took only 4 seconds and the file size actually reduced to 33MB.  That's a 99.8% decrease from what it was without the SCANSTRINGCOLUMNS=YES option!

 

Code to test on your environment

 

Use the following code to create the data sets and replicate the example described above.  I used SAS/ACCESS® interface to Snowflake and wrote my code in SAS Studio on Viya 4.0.

 

Step 1: Add system options to enhance performance statistics in the log.

OPTION 
SASTRACE=',,,ds' 
SASTRACELOC=SASLOG 
NOSTSUFFIX 
SQL_IP_TRACE=(note, source) 
msglevel=i 
FULLSTIMER;

Step 2: Create a copy of SASHELP.ZIPCODE in Snowflake with undefined varchar lengths.

Data SNOW.ZIP_UNDEFINED (KEEP= ZIP X Y ZIP_CLASS_U CITY_U STATE STATECODE_U
STATENAME_U COUNTY COUNTYNM_U MSA AREACODE AREACODES_U TIMEZONE_U GMTOFFSET 
DST_U PONAME_U ALIAS_CITY_U ALIAS_CITYN_U CITY2_U STATENAME2_U);
    length 
        ZIP_CLASS_U varchar(*)
        CITY_U varchar(*)
        STATECODE_U varchar(*)
        STATENAME_U varchar(*)
        COUNTYNM_U varchar(*)
        AREACODES_U varchar(*)
        TIMEZONE_U varchar(*)
        DST_U varchar(*)
        PONAME_U varchar(*)
        ALIAS_CITY_U varchar(*)
        ALIAS_CITYN_U varchar(*)
        CITY2_U varchar(*)
        STATENAME2_U varchar(*)
        ;
Set SASHELP.ZIPCODE;
        ZIP_CLASS_U = ZIP_CLASS;
        CITY_U = CITY;
        STATECODE_U = STATECODE;
        STATENAME_U = STATENAME;
        COUNTYNM_U = COUNTYNM;
        AREACODES_U = AREACODES;
        TIMEZONE_U = TIMEZONE;
        DST_U = DST;
        PONAME_U = PONAME;
        ALIAS_CITY_U = ALIAS_CITY;
        ALIAS_CITYN_U = ALIAS_CITYN;
        CITY2_U = CITY2;
        STATENAME2_U = STATENAME2;        
Run;

Step 3: Copy created Snowflake table back to SAS without SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size.

data work.sf_zip_1;
        set SNOW.ZIP_UNDEFINED;
run;

/*log results - real time 22.25 seconds with bulkunload*/

PROC CONTENTS data=work.sf_zip_1;

/* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 16 GB  file size. All  Char length variables = 32767*/

Step 4:  Copy created Snowflake table back to SAS with SCANSTRINGCOLUMNS=YES. Review log for real time and run PROC CONTENTS on SAS table to view size.

data work.sf_zip_2;
        set SNOW.ZIP_UNDEFINED
        (SCANSTRINGCOLUMNS=YES);
run;   

/*log results - real time 4.36 seconds with bulkunload*/

PROC CONTENTS data=work.sf_zip_2;
/* reveals there are 21 variables, 40938 rows. Uncompressed, UTF-8 encoding, 33 MB file size. Maximum Char length variable is 300 (ALIAS_CITYN_U) followed by ALIAS_CITY_U (284).*/

Resources

 

 

 

 

 

Comments

Since this option doesn't work in the SAS 9.4 Snowflake connector I instead use this to trim VARCHAR columns to a reasonable size:

 

conopts = 'default_varchar_size=256'

default_varchar_size is a Snowflake option you can set from SAS.

 

Also I have the COMPRESS SAS option on to ensure Snowflake tables read into SAS don't blow out in size.

 

Contributors
Version history
Last update:
2 weeks ago
Updated by:

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags