- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there any better method other than shown below 'select distinct' method to create table in database from SAS? I tried proc summary but it's failing on some of the larger tables
%macro test_loop( type_list = ); proc sql noprint; connect using &db_lib as _snow_temp; execute( create or replace table &schema_temp..&prompt_tbl_name as select distinct ¤t_distinct_var_list from &schema_temp..&source_tbl ; ) by _snow_temp; create table DATA.&prompt_table_name as select * from connection to _snow_temp ( select * from &schema_temp..&prompt_tbl_name; ) ; %end; quit; %mend test_loop; %let type_list = product_family*upn product_family*region plant_id*batch batch*region batch*event_region ; %test_loop( type_list = &type_list )
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.
options sastrace=',,,d' sastraceloc=saslog nostsuffix; LIBNAME <libref> snow <connection-options> <LIBNAME-options>; proc sql noprint; select distinct ¤t_distinct_var_list from <libref>.&source_tbl ; quit;
SAS will attempt to push as much of the processing as possible to the database.
The options set in the sample code above will show you in the SAS log what has been pushed to the database.
As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.
Which functions SAS can push to the database is documented here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you actually need to create both a SNOWFLAKE table
create or replace table &schema_temp..&prompt_tbl_name
and a SAS dataset?
create table DATA.&prompt_table_name as
If you only want to create the dataset then skip the step to create the table and just use that query in the second step.
create table DATA.&prompt_table_name as
select * from connection to _snow_temp (
select distinct ¤t_distinct_var_list
from &schema_temp..&source_tbl
)
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
However I'd like to seek your help to improve the code further with any
efficient method available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code portion that creates the table in snowflake is 100% Snowflake SQL. If you believe that doesn't perform good enough then you need to consult the Snowflake documentation or ask a question in a Snowflake forum.
For loading data from a database to SAS there are normally three things you can look into to improve elapsed times.
1. Setting of SAS library parameter readbuff (default is almost always too low)
2. Database starting to send the data before the query completes (that's a database setting so Snowflake docu/forum; not sure what's applicable for Snowflake)
3. Increase parallelism (that's again fully on the database side so Snowflake docu/forum; given it's Snowflake likely already set "optimal").
Additionally (and not sure what applies to Snowflake) you could also look into "dirty reading" (uncommited read) and if commited read how many times you commit (look also into SAS options for autocommit and libname option dbcommit). Less commits will improve performance as long as you can be confident that there aren't write operations happening on the table while you're reading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@David_Billa wrote:
Thank you for the information. How to set the parameter read buff?
Data Set Options for Snowflake
LIBNAME Statement for the Snowflake Engine
You could also try Proc Append for loading the data to SAS. Depending where the bottleneck is this could improve elapsed time.
And you could also try bulkload to SAS. This might - or might not - improve elapsed time. You will only know by trying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Patrick any simple example to bulk load to SAS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@David_Billa wrote:
@Patrick any simple example to bulk load to SAS?
Not sure about Snowflake but with Redshift I found it faster to use their COPY command that could create multiple delimited text files to an S3 bucket. This allowed the database to generate the files very quickly. I could then copy the files to some place that was local to the SAS server and use a simple data step to read them in.
The code to read in a series of delimited text files when you have an example of the known structure is very simple. Define the variables (in the right ORDER!), read the list of filenames, for each filename read the observations from that file.
So if you have an existing dataset with the variables defined properly, say it is named TEMPLATE, and a dataset named FILELIST with a variable named FILENAME
data want;
if 0 then set template;
set filelist;
do while(not eof);
infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
input (_all_) (+0);
output;
end;
run;
Here is an example you can run to see how it works in action:
filename csv temp;
data _null_;
set sashelp.class ;
file csv dsd ;
put (_all_) (+0);
run;
data filelist;
filename=pathname('csv');
run;
data want;
if 0 then set sashelp.class;
set filelist;
infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
do while (not eof);
input (_all_) (+0);
output;
end;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@David_Billa - SAS doesn't have bulk load functionality like a relational database. Reading external files produced by a relational database into SAS via a DATA step would be an equivalent process.
Also I suggest you try the DBCOMMIT option on your LIBNAME or CONNECT statements when creating Snowflake tables. This option can greatly improve table load times when you have a lot of data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@David_Billa wrote:
@Patrick any simple example to bulk load to SAS?
Ideally do some search. There are whitepapers and sample code out there that much better explain it than I could.
You might need additional bulkload options that are documented under the link already provided but in a nutshell it's:
proc append data=source base=target(bulkload=yes);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.
options sastrace=',,,d' sastraceloc=saslog nostsuffix; LIBNAME <libref> snow <connection-options> <LIBNAME-options>; proc sql noprint; select distinct ¤t_distinct_var_list from <libref>.&source_tbl ; quit;
SAS will attempt to push as much of the processing as possible to the database.
The options set in the sample code above will show you in the SAS log what has been pushed to the database.
As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.
Which functions SAS can push to the database is documented here.