BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anotherdream
Quartz | Level 8

Hello! Does anyone know if there is a book or area I can search that will explain the relationship between SAS and sql server tables pulled through a libname odbc connection?

FOr example, if I had the code:

libname mine odbc dsn='mydsn' schema=dbo bulkload=yes dbmax_text=32000;

data answer;

set mine.Table1;

Where module="Module 1";

run;

I would expect it to run faster if the column "module" had an index on it compared to it not having an index on it. In my own testing however, I cannot determine a noticable difference in the run speed, but that might be due to the fact that the table only has ~600,000 rows on it.

I am just curious if Base SAS actually takes indexes into consideration when pulled through a libname statement, and or how the libname statements treats these!

Thanks!

Brandon

1 ACCEPTED SOLUTION

Accepted Solutions
smiller933
Calcite | Level 5

If you have access to your DBMS-specific query tool (not SAS, but a stand-alone query tool), you might try running the query there. Most tools have ways to tell you the query plan that will tell you whether it will use an index or indexes, and which ones. I've worked with both Oracle and Microsoft SQL Server, and whenever I had a question like yours, I'd use the query tool to check the query plan and find out if the index was being used.

View solution in original post

14 REPLIES 14
Reeza
Super User

Does changing the message level help in this case? I know it will for SAS tables, but not sure about pass through as well.

options msglevel=i;

Anotherdream
Quartz | Level 8

Hey Reeza, thanks for your help. This actually didn't change anything in the log when I specified this option. Not sure if it was supposed to tho (this is the first time i've ever heard of the msglevel option, and the online info on it wasn't totally clear in my opinion).

art297
Opal | Level 21

You didn't include the dbindex= option thus probably never got the index.  When I googled it I found the following v8 description: ODBC Chapter, First Edition : LIBNAME Statement: ODBC Specifics

Tom
Super User Tom
Super User

What run speed changes did you expect to find?  Most of the speed would be limited by the speed that SAS can pull the data from the database and write it to the disk.  How long it takes the underlying database to locate the records would probably be swamped by those other factors.

RichardinOz
Quartz | Level 8

Have you tried using a libname that does not specify bulkload?  There may be an interaction between this option and the data read.  Bulkload is only of use in specified conditions when writing to an external table, and probably suspends indexing while loading.  It is at best redundant when reading data.  If you need it for writing to the table I would suggest using a different libname for the load.

The external database optimiser may also have an impact on whether the index is used.  If the number of records that satisfy your where condition is above a certain threshold the engine may decide it is more efficient to extract the whole table than to use the index to hunt for particular rows.

Richard temporarily back in Oz

AndrewHowell
Moderator

There are options to dump to the log the SQL code which SAS uses:

options sastrace=",,,d" sastraceloc=saslog;

(Yes, that's 3 commas followed by the letter d.)

This should show the exact SQL code which SAS passes through to the database, which may very well include indexing statements.

Anotherdream
Quartz | Level 8

Hello Tom. This appears to be a general pattern in responses that I recieve (mostly from you, but it also appears to be a pattern online). Can you elaborate more onto what are the major factors that influence how quickly SAS can pull the data and write it to disk? Is this primarily memory restricted, network restricted, etc?

I know for a fact my companys network is...... lets go with "not great" haha. Furthermore I am only working on a machine with 8 Gigs of ram total.

Also, when I run this code andrew I get the sql code that is run, (it's a simple select X,X1,X3  from Y1  where Z1="") however how would that help in knowning if it is using the index correctly? I guess technically since I am pulling from a column that has an Index on it it "should" be using it, but I am not sure that it is.

Thanks again everyone!

smiller933
Calcite | Level 5

"how would that help in knowning if it is using the index correctly? I guess technically since I am pulling from a column that has an Index on it it "should" be using it, but I am not sure that it is."

By knowing what query is passed, you can make an educated guess about whether an index would be used. Since the parameter is being correctly passed to the DBMS, you know that at this point the DBMS query engine has the info it needs to potentially use the index. Now it's up to the query engine to decide whether or not to use it. You can read entire books on indexing and optimization.

There are many queries that SAS has a hard time converting into SQL that can be processed by the DBMS -- in those cases, SAS pulls down the entire contents of the table and processes the query locally. You want to avoid this, except on small tables where it doesn't really matter.

BTW, IMHO text strings don't make great indexes unless they are very highly selective -- the usual rule of thumb is indexes are useful if they return less than ~15% of the data, but with a text string, I'd use a rule of thumb more like less than 5%. So something like a person's last name is good, but something like college major where you have about 30 different values, that's not so great. This is starting to dip into that huge topic of indexing.

smiller933
Calcite | Level 5

If you have access to your DBMS-specific query tool (not SAS, but a stand-alone query tool), you might try running the query there. Most tools have ways to tell you the query plan that will tell you whether it will use an index or indexes, and which ones. I've worked with both Oracle and Microsoft SQL Server, and whenever I had a question like yours, I'd use the query tool to check the query plan and find out if the index was being used.

Anotherdream
Quartz | Level 8

Haha good timing, I just read your previous post after this one, but we're saying the exact same thing.

That is kinda what I was expecting the answer to be. I'm going to have to assume it is using the Index then because the exact same sql query does use the index so that's how I'll have to treat it moving forward.

Also, the index does break the data into about ~1-2% distinct groups (there are over 100 distinct text values, of differing sizes) so it actually does dramatically speed up queries. Also it's the only logical breakdown in the data above the indivdual Loan Level, which I can't filter on by definition.

Thanks for the advice however!

smiller933
Calcite | Level 5

Yep, I'd say that's a safe assumption that it is using the index. If it's slow, then you have to figure other factors like network speed are causing the issue.

This is a long shot, but I'll try it anyway. If you are using select * instead of listing the columns you need, you might get a huge improvement listing the columns. If network speed is your bottleneck,  you could get a huge improvement by virtue of sending less data over the wire.

smiller933
Calcite | Level 5

After pressing "Add Reply" I realized I should look at your original query, and I see you are using a datastep with no keep and no drop.

So I strongly recommend that you add a keep statement and add just the columns you need. This can produce a huge improvement, particularly if the table contains a lot of variables, or if the table contains one HUGE text field that you never use (my company has a lot of cases like that).

To have an effect, the keep statement needs to be a dataset option like this:

data answer;

set mine.Table1 (keep=module myvar1 myvar2 myvar3);

Where module="Module 1";

run;

Anotherdream
Quartz | Level 8

Hello again smiller!

This is actually just a theoritical problem calling an imaginary table. In practice I only keep the columns I need related to the query, I was just curious as to if there was some kind of log or variable created within the SAS system that said "index XYZ was accessed" etc..

Also, I KNOW that network speed is my largest bottle neck by a simple experiment. We created a VM that was in the same location (as in Geographic location) as the servers the databases were stored on, and ran the same query, and it took 4 seconds. Over our network it took 5+ minutes.

So yeah, this was more of a theory craft question not related to performance in general, but how the inner workings of SAS work (many of my questions are).

My comment about the performance not being distriguishable was tested on the VM, and it took 4.01 seconds to 4.03 seconds (averages) but a t tests on the multiple tests showed the means weren't statistically different. Anyways, i'm pretty confident that the 4 seconds it is taking is again a bottle neck on the network (since techincally it still has to go over the network on the vm).

Anyways, your solution of just running the query in sql server using the index tools is what I plan on doing going forward.

jmic_nyk
Obsidian | Level 7

It depends on several things

  1. the query that is executed on the DBMS - does it use the index. Sometimes the DBMS decides not to use an index because of overhead (eg if there is relatively many occurences of "Module 1"). Try to force the DBMS to use the index (I think it's the dbindex= option for the libname statement). Sometimes it will acctually run slower when forced to use an index.
  2. have you tried the readbuff= option for the libname statement?
  3. try to debug your query using the sastrace= option (se another reply)

I don't think that SAS decides weather to use index on a DBMS table or not. In my opnion it's a decision made by the DBMS. ODBC is great, especially if you use more than one DBMS, but my experience is that it is difficult to "fine-tune" your I/O using ODBC. In order to do so you should use a DBMS specific engine instead.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2911 views
  • 0 likes
  • 8 in conversation