SAS Programming

David_Billa · Posted 02-25-2023 01:17 PM

Is there any better method other than shown below 'select distinct' method to create table in database from SAS? I tried proc summary but it's failing on some of the larger tables

%macro test_loop(
  type_list = 
);  
  proc sql noprint;  
    connect using &db_lib as _snow_temp;

      execute(
        create or replace table &schema_temp..&prompt_tbl_name as
          select distinct &current_distinct_var_list
          from &schema_temp..&source_tbl
        ;
      ) by _snow_temp;
            
      create table DATA.&prompt_table_name as
        select * from connection to _snow_temp (
          select *
          from &schema_temp..&prompt_tbl_name;
        )
      ;
      
    %end;    
  quit;
%mend test_loop;

%let type_list =   
  product_family*upn 
  product_family*region 
  plant_id*batch 
  batch*region 
  batch*event_region  
;
%test_loop(
  type_list = &type_list
)

Patrick · Posted 02-25-2023 07:09 PM

If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.

options sastrace=',,,d' sastraceloc=saslog nostsuffix;
LIBNAME <libref> snow <connection-options> <LIBNAME-options>;

proc sql noprint;  
  select distinct &current_distinct_var_list
  from <libref>.&source_tbl
  ;
quit;

SAS will attempt to push as much of the processing as possible to the database.

The options set in the sample code above will show you in the SAS log what has been pushed to the database.

As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.

Which functions SAS can push to the database is documented here.

View solution in original post

Tom · Posted 02-25-2023 01:26 PM

Do you actually need to create both a SNOWFLAKE table

create or replace table &schema_temp..&prompt_tbl_name

and a SAS dataset?

create table DATA.&prompt_table_name as

If you only want to create the dataset then skip the step to create the table and just use that query in the second step.

      create table DATA.&prompt_table_name as
        select * from connection to _snow_temp (
          select distinct &current_distinct_var_list
          from &schema_temp..&source_tbl
        )
      ;

David_Billa · Posted 02-25-2023 08:28 PM

Yes, I was asked to create the tables as shown in initial post.

However I'd like to seek your help to improve the code further with any
efficient method available.

Patrick · Posted 02-25-2023 08:56 PM

The code portion that creates the table in snowflake is 100% Snowflake SQL. If you believe that doesn't perform good enough then you need to consult the Snowflake documentation or ask a question in a Snowflake forum.

For loading data from a database to SAS there are normally three things you can look into to improve elapsed times.

1. Setting of SAS library parameter readbuff (default is almost always too low)

2. Database starting to send the data before the query completes (that's a database setting so Snowflake docu/forum; not sure what's applicable for Snowflake)

3. Increase parallelism (that's again fully on the database side so Snowflake docu/forum; given it's Snowflake likely already set "optimal").

Additionally (and not sure what applies to Snowflake) you could also look into "dirty reading" (uncommited read) and if commited read how many times you commit (look also into SAS options for autocommit and libname option dbcommit). Less commits will improve performance as long as you can be confident that there aren't write operations happening on the table while you're reading.

David_Billa · Posted 02-25-2023 09:02 PM

Thank you for the information. How to set the parameter read buff?

Patrick · Posted 02-25-2023 09:12 PM

@David_Billa wrote:
Thank you for the information. How to set the parameter read buff?

Data Set Options for Snowflake

LIBNAME Statement for the Snowflake Engine

You could also try Proc Append for loading the data to SAS. Depending where the bottleneck is this could improve elapsed time.

And you could also try bulkload to SAS. This might - or might not - improve elapsed time. You will only know by trying.

David_Billa · Posted 02-26-2023 11:28 AM

@Patrick any simple example to bulk load to SAS?

Tom · Posted 02-26-2023 12:24 PM

@David_Billa wrote:

@Patrick any simple example to bulk load to SAS?

Not sure about Snowflake but with Redshift I found it faster to use their COPY command that could create multiple delimited text files to an S3 bucket. This allowed the database to generate the files very quickly. I could then copy the files to some place that was local to the SAS server and use a simple data step to read them in.

The code to read in a series of delimited text files when you have an example of the known structure is very simple. Define the variables (in the right ORDER!), read the list of filenames, for each filename read the observations from that file.

So if you have an existing dataset with the variables defined properly, say it is named TEMPLATE, and a dataset named FILELIST with a variable named FILENAME

data want;
  if 0 then set template;
  set filelist;
  do while(not eof);
      infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
      input (_all_) (+0);
      output;
  end;
run;

Here is an example you can run to see how it works in action:

filename csv temp;
data _null_;
  set sashelp.class ;
  file csv dsd ;
  put (_all_) (+0);
run;

data filelist;
  filename=pathname('csv');
run;

data want;
  if 0 then set sashelp.class;
  set filelist;
  infile dummy filevar=filename dsd truncover firstobs=1 end=eof;
  do while (not eof);
    input (_all_) (+0);
    output;
  end;
run;

proc print;
run;

SASKiwi · Posted 02-26-2023 02:17 PM

@David_Billa - SAS doesn't have bulk load functionality like a relational database. Reading external files produced by a relational database into SAS via a DATA step would be an equivalent process.

Also I suggest you try the DBCOMMIT option on your LIBNAME or CONNECT statements when creating Snowflake tables. This option can greatly improve table load times when you have a lot of data.

Patrick · Posted 02-27-2023 01:53 AM

@David_Billa wrote:

@Patrick any simple example to bulk load to SAS?

Ideally do some search. There are whitepapers and sample code out there that much better explain it than I could.

You might need additional bulkload options that are documented under the link already provided but in a nutshell it's:

proc append data=source base=target(bulkload=yes);
run;

Patrick · Posted 02-25-2023 07:09 PM

If you just need the distinct list of rows on the SAS side and you've got a libname defined then code as below would do the job as well.

options sastrace=',,,d' sastraceloc=saslog nostsuffix;
LIBNAME <libref> snow <connection-options> <LIBNAME-options>;

proc sql noprint;  
  select distinct &current_distinct_var_list
  from <libref>.&source_tbl
  ;
quit;

SAS will attempt to push as much of the processing as possible to the database.

The options set in the sample code above will show you in the SAS log what has been pushed to the database.

As long as you don't use SAS functions that SAS can't translate into database functions the whole SQL will run in snowflake.

Which functions SAS can push to the database is documented here.

SAS Programming

Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

Re: Efficient way to create table

[SQL] Create Table - 제약조건

Most efficient way to use long list of alphanumeric values to create n...

Maximize Efficiency for Data Engineers | SAS Viya August and September...

Create customized tables in SAS

Creating a table with Proc Report Sas 9.4

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...