SAS: how to make it write proper SQL that do not degrade performance 1... - Page 3

JroeJroe · Posted 10-03-2019 06:01 PM

ODBC. Funny you mention that I passed on to folks to try the commands below... it worked for me but the "tcpmsglen" setting failed for a user.

options tbufsize=512k;
options TCPMSGLEN=512k;

I chose 512k to match the common intel trasmit/receive buffer size of 512k. Thanks!

ChrisNZ · Posted 10-03-2019 06:10 PM

TBUFSIZE is for SAS/CONNECT only afaik, and is set at signon, or in the config file.

TCPMSGLEN is set in the config file.

ODBC might be the reason for your speed and single-thread woes. This driver hasn't evolved in years (again afaik).

You may want to test the native SAS/ACCESS Interface to Microsoft SQL Server and see if speed is better.

High-Performance SAS Coding - Third Edition

JroeJroe · Posted 10-03-2019 06:12 PM

I spoke to an admin about testing the article you linked. I have a similar thread with similar issues:

https://communities.sas.com/t5/New-SAS-User/Simple-SAS-queries-continue-to-run-single-threaded-in-MS...

I since upgraded to the latest 2017 driver for odbc (from natural 2011) and observe similar issues.

Driver=ODBC Driver 17 for SQL Server;

From what I read on the two settings we discussed, those can be set as options flags in the help guides?

ChrisNZ · Posted 10-03-2019 06:15 PM

> those can be set as options flags in the help guides?

I am unsure what this means. These option can be set for the SAS session upon invoking SAS.

High-Performance SAS Coding - Third Edition

ScottBass · Posted 10-03-2019 11:53 PM

ODBC might be the reason for your speed and single-thread woes. This driver hasn't evolved in years (again afaik).

You may want to test the native SAS/ACCESS Interface to Microsoft SQL Server and see if speed is better.

This is before I got here, and told to me by my colleagues, so this is third hand...

When setting up our SAS environment, SAS Professional Services recommended ODBC over Access to SQL Server. I don't know why, and that sounded "fishy" to me.

I agree with @ChrisNZ that ODBC is a generic standard, a "vanilla" approach to multiple RDBMS's. However, the driver itself is from Microsoft, and AFAIK the SAS ODBC engine is just a wrapper around the capabilities/API's available from the Microsoft driver.

When I choose "Export Data" from SSMS, I get these drivers in my environment:

I have never tested them all - I've always used SQL Server Native Client 11.0 - but I bet I'd get different performance characteristics across the different drivers.

Call or email SAS TS. Explain your problem. Ask them if there are performance differences between the generic SAS/Access to ODBC vs. SAS/Access to SQL Server? But if your underlying hardware infrastructure (network bandwidth, local disk I/O) won't keep up with how fast SS is passing data to the client (SAS), I wouldn't see how the different engines would compensate for that.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

ScottBass · Posted 10-03-2019 11:03 PM

I still see a high amount of MS SQL wait types of ASYNC_NETWORK_IO (SAS reads data slower than SQL sends it), it would be great to find a way to reduce this wait type

This isn't necessarily SAS's fault. If your SQL Server (SS) is highly spec'd, but either 1) your network between SS and SAS is slow, 2) your SAS server is lowly spec'd, 3) your driver is inefficient (i.e. Microsoft's ODBC driver), 4) other reasons?, then you will see these wait types.

https://www.sqlshack.com/reducing-sql-server-async_network_io-wait-type/

It primarily happens when SAS runs data "single threaded" instead of parallel processing

Have a look at the dbsliceparm and dbslice options:

https://documentation.sas.com/?docsetId=acreldb&docsetTarget=p0qs980zd6e2ain0zb0903igkfje.htm&docset...

https://documentation.sas.com/?docsetId=acreldb&docsetTarget=p1etmj5mp89b8un13on2um4ivhav.htm&docset...

Here is an excerpt from one of my programs:

   options dbsliceparm=(ALL,4);


   * dbslice: <table integer column> % (T-SQL mod operator) <number of threads> = 0 - <number of threads - 1> ;
   * if you use rldx_archive_key, add an additional slice to account for NULL rldx_archive_keys ;
   * the sum of the row counts in each slice MUST equal the total row count of the source table! ;
   set 
      tmp.rldx_pub_master (
         dbslice=(
            "rldx_archive_key % 4 = 0"
            "rldx_archive_key % 4 = 1"
            "rldx_archive_key % 4 = 2"
            "rldx_archive_key % 4 = 3"
            "rldx_archive_key % 4 IS NULL"
         )
      )
      tmp.rldx_pvt_master (
         dbslice=(
            "rldx_archive_key % 4 = 0"
            "rldx_archive_key % 4 = 1"
            "rldx_archive_key % 4 = 2"
            "rldx_archive_key % 4 = 3"
            "rldx_archive_key % 4 IS NULL"
         )
      )
   end=eof;

Using this approach, I got much better performance than without these options.

The best candidate for the dbslice column is a NOT NULL column with even distribution across the table, such as an IDENTITY column. SAS will attempt to pick a suitable column and algorithm (usually (always?) MOD) to slice the table, but I like to explicitly specify the dbslice conditions.

it only occurs when SAS communicates to sql.

Are you sure about that? Have you analyzed this wait type against all your other applications that hit your SQL Server...or just SAS? What about your .Net applications that pull GB's of data from SS and process the results on a "slow local drive"? And those applications would need to be pulling the same volume of data as you are likely pulling from SAS. And have the same network speed back to your server. Etc, etc.

Often users are pulling data down to their slower local drives into sas,

Not SAS's fault then. Improve the I/O to the "slower local drives" to the same standard as your SS disk I/O.

A few tips that I have passed on to others:

a. if your joining a table that exists in SAS to SQL, that table doesnt exist in SQL so SAS will "select the columns needed and pull them back to sas then join", so they could add the table to sql if needed (and index it) or index their table in sas

Agreed, in general I find homogeneous joins to be faster. IOW:

If you have a large table in SS, and a small dataset in SAS, upload the SAS dataset to SS, join using explicit passthrough, then download the (hopefully smaller) results to SAS (if needed - you might continue downstream processing with the data already on SS, which is what we often do).

If you have a large dataset in SAS, and a small table in SS, AND if your SAS server has similar capabilities as your SS (768GB of ram and 24 cores (xeon), it can read/write at 2000MBs+), then download the small SS table to SAS and join there.

In my environment, we have a powerful SS machine, a puny SAS machine (sigh), and average network speed between the two (not fast, not slow), so we do as much processing on the SS machine as possible by uploading data to SS and using explicit passthrough. We optimize complex queries in SSMS before cut-and-pasting to SAS.

b. remove "SAS" code that doesnt exist in sql (macro variables, certain functions etc)

Incorrect advice. SAS macros and macro variables get resolved in a pre-compile phase. A bit analogous to the pre-processor in C programming. Macro and macro variables are, in general - ignoring data step interfaces such as symputx and symget - just a code generation step, BEFORE the SAS compiler or SAS procedure executes. There is nothing wrong with including SAS macros or macro variables in SQL code; they get resolved even in explicit passthrough before SAS sends the results to SS. The end result is the equivalent of you typing in the resolved code from the macro or macro variable.

Yes, if there is some SAS specific function that cannot be sent to SS, then SAS may have to download the entire SS table, then filter within SAS. But there is usually a way of optimizing this esp. via native T-SQL and explicit passthrough. And you would face the same issues in a .Net program, Python program, Excel workbook, etc., that was hitting your SS. If your data is in SS, filter it first on SS, then apply your application logic/processing.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

ChrisNZ · Posted 10-03-2019 11:22 PM

> The server is enterprise with 768GB of ram and 24 cores (xeon), it can read/write at 2000MBs+.

On my previous job, (not involved in the SAS admin side at the moment), our SAS server (under Windows) equipped with two 10Gb NICs could never use both at the same time and max network speed never reached 1GB/s.

The supplier (HP) spent time investigating and told us it was a Windows limitation and the same gear running Linux would run at full speed.

Whether this is useful or not I'm not too sure.

High-Performance SAS Coding - Third Edition

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Registration is open

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Re: SAS: how to make it write proper SQL that do not degrade performance 10000% or more against MS S

Registration is open

SAS Training: Just a Click Away