DBMAX_TEXT= LIBNAME Option

MB_Analyst · Posted 11-05-2019 06:15 AM

I'm connecting to a database hosted somewhere in the cloud, via an ODBC LIBNAME statement.

When I try to load the data into memory with a DATA step, it takes around 4 minutes to bring in ~25,000 observations. Is this slower than expected? Is there a method I can use that makes bringing the data into my WORK folder faster? Would using a hash table speed this up, or are hash tables more useful for joins?

DATA want;
    set cloud.have (keep=id and other vars)
    where year >= 2019;
RUN;

Kurt_Bremser · Posted 11-05-2019 06:50 AM

Look at the speed of your network connection.

It might also be that the execution of your where condition takes too much time because no index is defined for year.

For diagnostic purposes, have the select run natively on the database for comparison.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

MB_Analyst · Posted 11-05-2019 06:58 AM

Thanks for the ideas. I also discovered some of the variables were stored as massive strings (1000+ characters). I can't imagine that helps with processing time...

SASKiwi · Posted 11-05-2019 02:05 PM

Test a row count from your ODBC connection:

proc sql;
  select count(*)
  from cloud.have
  where year >= 2019;
quit;

If this runs a lot faster then you can conclude your network connection is what is slowing you down.

Patrick · Posted 11-05-2019 02:36 PM

@MB_Analyst wrote:

Thanks for the ideas. I also discovered some of the variables were stored as massive strings (1000+ characters). I can't imagine that helps with processing time...

Network throughput is often the bottleneck and the length of the variables will impact on performance.

If you know that the variable lengths are too big then you could reduce them on the DB side before transferring the data over the network. This would certainly work with explicit SQL pass-through but you could also give below code a try.

I'm not sure if SAS will push the length definition to the database so you'll have to check in the log how the actual SQL sent to the DB looked like (the options posted will create such SAS log information).

options sastrace='d' sastraceloc=saslog nostsuffix;
proc sql;
  create table want as
    select
      c.id,
      strip(c.varLen1000) as varLen length=$100,
      ....
    from cloud.have c
    where c.year >= 2019
    ;
quit

reprui · Posted 11-05-2019 03:04 PM

Massive strings may be handled by adding this option to your LIBNAME statement:

DBMAX_TEXT= LIBNAME Option

Determines the length of any very long DBMS character data type, such as BLOB or CLOB, that is read into SAS or written from SAS when using a SAS/ACCESS engine.

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Patrick · Posted 11-05-2019 03:08 PM

@reprui

That's to define the max string length exchangeable between SAS and a DB and to avoid truncation for long strings. It has no impact on performance (unless there is string truncation).

reprui · Posted 11-07-2019 08:25 AM

Agreed in some context...However, I've seen multiple applications of this in customer use cases where it does impact performance. Especially when leveraging in-database PROC's (e.g., MEANS, TABULATE, FREQ, REPORT, PRINT). During the "READ" operation is where this occurs. Also, connecting via a Libname in EG (or other SAS UI) and viewing Hadoop data (for example) that has 32767 byte columns will lock your session even if it is only one column.

Of course the optimal "best practice" is to format the data at its source properly and not leave it to be defined by a default value of STRING and without a column width specification. Typically this requires the Data Management team to be more intentional about data quality.

The problem can also be solved by creating well structured "Views" of the data if the source tables cannot be modified.

Here are some additional links that discuss relevant topics:

Performance Considerations

Optimizing Your SQL Usage

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Ksharp · Posted 11-05-2019 07:58 AM

Try option:

libname cloud ODBC ...... readbuff=10000 ;

Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

DBMAX_TEXT= LIBNAME Option

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

Re: Loading data from ODBC cloud server extremely slow

The 2025 SAS Hackathon has begun!