BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

Is there any better method other than shown below 'select distinct' method to create table in database from SAS? I tried proc summary but it's  failing on some of the larger tables

 

%macro test_loop(
  type_list = 
);  
  proc sql noprint;  
    connect using &db_lib as _snow_temp;

      execute(
        create or replace table &schema_temp..&prompt_tbl_name as
          select distinct &current_distinct_var_list
          from &schema_temp..&source_tbl
        ;
      ) by _snow_temp;
            
      create table DATA.&prompt_table_name as
        select * from connection to _snow_temp (
          select *
          from &schema_temp..&prompt_tbl_name;
        )
      ;
      
    %end;    
  quit;
%mend test_loop;

%let type_list =   
  product_family*upn 
  product_family*region 
  plant_id*batch 
  batch*region 
  batch*event_region  
;
%test_loop(
  type_list = &type_list
)
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.

options sastrace=',,,d' sastraceloc=saslog nostsuffix;
LIBNAME <libref> snow <connection-options> <LIBNAME-options>;

proc sql noprint;  
  select distinct &current_distinct_var_list
  from <libref>.&source_tbl
  ;
quit;

SAS will attempt to push as much of the processing as possible to the database. 

The options set in the sample code above will show you in the SAS log what has been pushed to the database.

As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.

Which functions SAS can push to the database is documented here.

 

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Do you actually need to create both a SNOWFLAKE table

create or replace table &schema_temp..&prompt_tbl_name 

and a SAS dataset?

create table DATA.&prompt_table_name as

If you only want to create the dataset then skip the step to create the table and just use that query in the second step.

      create table DATA.&prompt_table_name as
        select * from connection to _snow_temp (
          select distinct &current_distinct_var_list
          from &schema_temp..&source_tbl
        )
      ;
David_Billa
Rhodochrosite | Level 12
Yes, I was asked to create the tables as shown in initial post.

However I'd like to seek your help to improve the code further with any
efficient method available.
Patrick
Opal | Level 21

The code portion that creates the table in snowflake is 100% Snowflake SQL. If you believe that doesn't perform good enough then you need to consult the Snowflake documentation or ask a question in a Snowflake forum. 

 

For loading data from a database to SAS there are normally three things you can look into to improve elapsed times.

1. Setting of SAS library parameter readbuff (default is almost always too low)

2. Database starting to send the data before the query completes (that's a database setting so Snowflake docu/forum; not sure what's applicable for Snowflake)

3. Increase parallelism (that's again fully on the database side so Snowflake docu/forum; given it's Snowflake likely already set "optimal").

 

Additionally (and not sure what applies to Snowflake) you could also look into "dirty reading" (uncommited read) and if commited read how many times you commit (look also into SAS options for autocommit and libname option dbcommit). Less commits will improve performance as long as you can be confident that there aren't write operations happening on the table while you're reading.

David_Billa
Rhodochrosite | Level 12
Thank you for the information. How to set the parameter read buff?
Patrick
Opal | Level 21

@David_Billa wrote:
Thank you for the information. How to set the parameter read buff?

Data Set Options for Snowflake

LIBNAME Statement for the Snowflake Engine

 

You could also try Proc Append for loading the data to SAS. Depending where the bottleneck is this could improve elapsed time. 

And you could also try bulkload to SAS. This might - or might not - improve elapsed time. You will only know by trying.

David_Billa
Rhodochrosite | Level 12

@Patrick any simple example to bulk load to SAS?

Tom
Super User Tom
Super User

@David_Billa wrote:

@Patrick any simple example to bulk load to SAS?


Not sure about Snowflake but with Redshift I found it faster to use their COPY command that could create multiple delimited text files to an S3 bucket.  This allowed the database to generate the files very quickly.  I could then copy the files to some place that was local to the SAS server and use a simple data step to read them in. 

 

The code to read in a series of delimited text files when you have an example of the known structure is very simple.  Define the variables (in the right ORDER!), read the list of filenames, for each filename read the observations from that file.

So if you have an existing dataset with the variables defined properly, say it is named TEMPLATE, and a dataset named FILELIST with a variable named FILENAME

data want;
  if 0 then set template;
  set filelist;
  do while(not eof);
      infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
      input (_all_) (+0);
      output;
  end;
run;

Here is an example you can run to see how it works in action:

filename csv temp;
data _null_;
  set sashelp.class ;
  file csv dsd ;
  put (_all_) (+0);
run;

data filelist;
  filename=pathname('csv');
run;

data want;
  if 0 then set sashelp.class;
  set filelist;
  infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
  do while (not eof);
    input (_all_) (+0);
    output;
  end;
run;

proc print;
run;

 

 

SASKiwi
PROC Star

@David_Billa  - SAS doesn't have bulk load functionality like a relational database. Reading external files produced by a relational database into SAS via a DATA step would be an equivalent process.

 

Also I suggest you try the DBCOMMIT option on your LIBNAME or CONNECT statements when creating Snowflake tables. This option can greatly improve table load times when you have a lot of data. 

Patrick
Opal | Level 21

@David_Billa wrote:

@Patrick any simple example to bulk load to SAS?


Ideally do some search. There are whitepapers and sample code out there that much better explain it than I could.

You might need additional bulkload options that are documented under the link already provided but in a nutshell it's:

proc append data=source base=target(bulkload=yes);
run;
Patrick
Opal | Level 21

If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.

options sastrace=',,,d' sastraceloc=saslog nostsuffix;
LIBNAME <libref> snow <connection-options> <LIBNAME-options>;

proc sql noprint;  
  select distinct &current_distinct_var_list
  from <libref>.&source_tbl
  ;
quit;

SAS will attempt to push as much of the processing as possible to the database. 

The options set in the sample code above will show you in the SAS log what has been pushed to the database.

As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.

Which functions SAS can push to the database is documented here.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2045 views
  • 5 likes
  • 4 in conversation