BookmarkSubscribeRSS Feed
MB_Analyst
Obsidian | Level 7

I'm connecting to a database hosted somewhere in the cloud, via an ODBC LIBNAME statement. 

 

When I try to load the data into memory with a DATA step, it takes around 4 minutes to bring in ~25,000 observations. Is this slower than expected? Is there a method I can use that makes bringing the data into my WORK folder faster? Would using a hash table speed this up, or are hash tables more useful for joins? 

DATA want;
    set cloud.have (keep=id and other vars)
    where year >= 2019;
RUN;
8 REPLIES 8
Kurt_Bremser
Super User

Look at the speed of your network connection.

It might also be that the execution of your where condition takes too much time because no index is defined for year.

 

For diagnostic purposes, have the select run natively on the database for comparison.

MB_Analyst
Obsidian | Level 7

Thanks for the ideas. I also discovered some of the variables were stored as massive strings (1000+ characters). I can't imagine that helps with processing time...

SASKiwi
PROC Star

Test a row count from your ODBC connection:

proc sql;
  select count(*)
  from cloud.have
  where year >= 2019;
quit;

If this runs a lot faster then you can conclude your network connection is what is slowing you down.

 

Patrick
Opal | Level 21

@MB_Analyst wrote:

Thanks for the ideas. I also discovered some of the variables were stored as massive strings (1000+ characters). I can't imagine that helps with processing time...


 

Network throughput is often the bottleneck and the length of the variables will impact on performance.

 

If you know that the variable lengths are too big then you could reduce them on the DB side before transferring the data over the network. This would certainly work with explicit SQL pass-through but you could also give below code a try.

I'm not sure if SAS will push the length definition to the database so you'll have to check in the log how the actual SQL sent to the DB looked like (the options posted will create such SAS log information).

options sastrace='d' sastraceloc=saslog nostsuffix;
proc sql;
  create table want as
    select
      c.id,
      strip(c.varLen1000) as varLen length=$100,
      ....
    from cloud.have c
    where c.year >= 2019
    ;
quit
reprui
SAS Employee

Massive strings may be handled by adding this option to your LIBNAME statement:

DBMAX_TEXT= LIBNAME Option

Determines the length of any very long DBMS character data type, such as BLOB or CLOB, that is read into SAS or written from SAS when using a SAS/ACCESS engine.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Patrick
Opal | Level 21

@reprui 

That's to define the max string length exchangeable between SAS and a DB and to avoid truncation for long strings. It has no impact on performance (unless there is string truncation).

reprui
SAS Employee

Agreed in some context...However, I've seen multiple applications of this in customer use cases where it does impact performance.  Especially when leveraging in-database PROC's (e.g., MEANS, TABULATE, FREQ, REPORT, PRINT).  During the "READ" operation is where this occurs.  Also, connecting via a Libname in EG (or other SAS UI) and viewing Hadoop data (for example) that has 32767 byte columns will lock your session even if it is only one column.

 

Of course the optimal "best practice" is to format the data at its source properly and not leave it to be defined by a default value of STRING and without a column width specification.  Typically this requires the Data Management team to be more intentional about data quality.

 

The problem can also be solved by creating well structured "Views" of the data if the source tables cannot be modified.

 

Here are some additional links that discuss relevant topics:

Performance Considerations

Optimizing Your SQL Usage


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Ksharp
Super User

Try option:

 

libname cloud ODBC ......  readbuff=10000 ;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3784 views
  • 0 likes
  • 6 in conversation