Hi all,
I’m working on fetching large tables from Snowflake into SAS WORK datasets. Here’s my situation:
We have a pre-assigned SAS library (snow) for Snowflake access, with all the following settings:
BULKLOAD=YES BULKUNLOAD=YES BL_INTERNAL_STAGE='@~' BL_DELETE_DATAFILE=YES BL_COMPRESS=YES BL_USE_ESCAPE=YES BL_NUM_READ_THREADS=8 /* 8 CPUs on a node */
This library does not define database or schema, and all developers use this shared library.
I need to fetch data from a large Snowflake table (~200M rows, 7–8 columns).
I want to use BULKUNLOAD (with BL_NUM_DATA_FILES) to speed up the fetch, but I’m running into some limitations.
Current Pass-Through Approach
proc sql;
connect using snow as snow;
execute (use role SF_Role);
execute (use warehouse M_VWH);
create table work.test as
select columnA, columnB
from connection to snow
(select * from database.schema.snowflake_table);
disconnect from snow;
quit;
Observations:
The Snowflake pass through query executes quickly (~8 seconds), but fetching into SAS WORK takes 13+ minutes.
Because I’m using PROC SQL pass-through, the BULKUNLOAD and internal stage mechanism is not used.
Using LIBNAME + DATA Step Works Fast
libname sf_bulk snow dsn=snowflake warehouse=M_VWH role=SF_Role
database=db schema=Schema authdomain="Snowflake"
BULKLOAD=YES BULKUNLOAD=YES
BL_INTERNAL_STAGE='@~'
BL_DELETE_DATAFILE=YES
BL_COMPRESS=YES
BL_USE_ESCAPE=YES
BL_NUM_READ_THREADS=8;
data work.test;
set sf_bulk.snowflake_table;
run;
This approach uses BULKUNLOAD and internal stage, and the fetch completes in ~4 minutes.
I verified in Snowflake using list @~ that intermediate bulk files were created.
I cannot switch all code to DATA step because pass-through is needed for transformation logic inside Snowflake.
So I thought of creating a temporary table in Snowflake with all the transformations, then fetch it via SAS:
libname sf_temp snow dsn=snowflake warehouse=M_VWH role=SF_Role
database=db schema=Schema authdomain="Snowflake"
BULKLOAD=YES BULKUNLOAD=YES
BL_INTERNAL_STAGE='@~'
BL_DELETE_DATAFILE=YES
BL_COMPRESS=YES
BL_USE_ESCAPE=YES
BL_NUM_READ_THREADS=8;
proc sql;
connect using sf_temp as sf_temp;
execute (
create temporary table temp_mytable as
select columnA, columnB
from database.schema.snowflake_table
) by sf_temp;
create table work.test as
select *
from connection to sf_temp
(select * from database.schema.temp_mytable);
disconnect from sf_temp;
quit;
Observations:
This works, but BULKUNLOAD and internal stage are not used, so performance is slow.
I tried using data step in between , but it got failed with no temp table available.
Advice Needed
1. Is there any way to use BULKUNLOAD / internal stage with a temporary table via SAS/ACCESS?
2. Would DBMSTEMP= help in this scenario, or is it unsupported with Snowflake?
3. Are there any best practices for using temp tables and bulk fetch with pre-assigned libraries?
Please let me know if you have any solution for this. Thank you..
... View more