Solved: Re: Indexes syntax to speed up queries

Peter_B · Posted 06-22-2018 01:22 AM

Hi,

I have some large tables that I need to query but sometimes the query takes so long, SAS EG times out or crashes.

I got a list of the Indexes from our IT guys & read up on some ways to improve my query but I think my syntax is wrong.

Below are a couple of examples that just don't run at the moment.

Any tips as to my syntax welcomed.

I read that writing (idxwhere=no) after the table name will stop SAS doing a full table scan.

My error is just a general syntax error so I must have it really wrong.

Thanks for any help.

Select *

from Table_a A (idxwhere=no)

Inner Join Table_b B

on A.ID = B.ID

------------------------------------------

Select *

from Table_a A (idxname=Cust_id)

Inner Join Table_b B

on A.ID = B.ID

Patrick · Posted 06-23-2018 03:52 AM

@Peter_B

Shouldn't the Index field exist on the table when I query it?

"index fields" don't exist. There can be fields for which an index has been created. Look at an index as a separate object besides of the table. It's kind-off a table of content so when you're looking for something instead of having to scan through the whole table, you first go to the index which tells you where the data lives in the table.

There is always the odd exception but normally there is nothing special you need to do/code for Oracle to use an index if and as appropriate. You don't need to specifically "mention" the index name in your code.

Looking at the code you've previously posted I would expect C.CMX_TRAN_ID to be indexed (or it's even the primary key).

Select xxxx
from TDRCORE.FCM_CSDP C 
INNER JOIN FCMCORE.FCM_RULES_FIRED F
ON C.CMX_TRAN_ID = F.CMX_TRAN_ID

May be it's just the data volume your query returns, which makes that slow. Run a Select count(*) from <your query> to test this theory.

Also: Is there also a Create Table in your actual SQL or are you actually creating output. If the latter then this could certainly take up a bit of time as soon as data volumes grow.

And last but not least:

Make sure that libname option READBUFF (for TDRCORE and FCMCORE) is set to a much higher value than the default.

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=acreldb&docsetTarget=p0fmg...

View solution in original post

andreas_lds · Posted 06-22-2018 01:26 AM

Have a look at the documentation of proc sql - you need

proc sql;
 <insert query here>;
quit;

EDIT: Hit "post" to early.

Kurt_Bremser · Posted 06-22-2018 01:55 AM

As an alternative, use sorting with proc sort and a data step merge (unless you have to deal with a many-to-many relationship).

You might be surprised by the performance.

When encountering a syntax error, always post the complete log of the failing step(s). Use the {i} button for logs.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASKiwi · Posted 06-22-2018 03:30 AM

If you provided more details on your table joins like posting the SAS log of one of them as well as table row numbers we might be able to provide alternative techniques.

Peter_B · Posted 06-22-2018 10:59 PM

Hi SASKiwi,

The error is as below:

ERROR: ORACLE prepare error: ORA-00933: SQL command not properly ended.

The Proc SQL query runs OK without the '(idxwhere=no)' in the where clause below:

Select xxxx
from TDRCORE.FCM_CSDP C (idxwhere=no)
INNER JOIN FCMCORE.FCM_RULES_FIRED F
ON C.CMX_TRAN_ID = F.CMX_TRAN_ID

My question is where & why this query fails because of this Index reference & what's the correct way to apply this Index?

Thanks

SASKiwi · Posted 06-22-2018 11:55 PM

IDXWHERE is only valid when querying SAS data, not Oracle data.

Peter_B · Posted 06-23-2018 12:37 AM

Ok, thanks, so how do I apply an index to an Oracle query?

Patrick · Posted 06-23-2018 01:01 AM

@Peter_B

Oracle does a lot of this stuff for you "automatically". If you know what indexes exist then try to formulate your WHERE clauses and JOIN conditions along these indexes, i.e. if the is an index for ColA, ColB, ColC the write your select to first select on ColA before ColB.

What's also important when interfacing with data bases: Try and push processing to the data base so that you reduce data volumes as much as possible before you transfer the result set into SAS.

1. Avoid joins between SAS and Oracle tables (load the smaller table to the side of the larger table first)

2. Avoid using SAS functions which the SAS Access engine can't push to the database (as else all the data gets first transferred to the SAS side and only then processed)

Here the list of supported SAS functions:

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=acreldb&docsetTarget=p0f64...

Use the following options in your code as this will show you in the SAS log what part of implicit SAS SQL has actually been sent to the database for execution - and though which part has only been executed on the SAS side.

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

And if performance becomes really important and you need full control: Use explicit SQL pass-through as this allows you to write the SQL directly in the Oracle SQL flavor and also use Oracle SQL specific functions and syntax (including hints).

You can develop such SQL directly in clients like SQL Developer and once working just copy/paste the code into the SAS code part for your explicit SQL.

Peter_B · Posted 06-23-2018 01:37 AM

Thanks Patrick,

Your explanation helps a lot.

So I now understand the Index fields should be used in the Join between the 2 tables I;m using.

I have a list of the Index fields that IT has created on the Large Transaction table I'm joining to but none of those fields exist on my Transaction or main tables?

Shouldn't the Index field exist on the table when I query it?

Do I need to go back to IT & ask how I join these 2 tables using an Index considering I can't see any Index field?

Or should I just use one of the Index fields & join both tables by an Alias?

Thanks for you help.

Patrick · Posted 06-23-2018 03:52 AM

@Peter_B

Shouldn't the Index field exist on the table when I query it?

"index fields" don't exist. There can be fields for which an index has been created. Look at an index as a separate object besides of the table. It's kind-off a table of content so when you're looking for something instead of having to scan through the whole table, you first go to the index which tells you where the data lives in the table.

There is always the odd exception but normally there is nothing special you need to do/code for Oracle to use an index if and as appropriate. You don't need to specifically "mention" the index name in your code.

Looking at the code you've previously posted I would expect C.CMX_TRAN_ID to be indexed (or it's even the primary key).

Select xxxx
from TDRCORE.FCM_CSDP C 
INNER JOIN FCMCORE.FCM_RULES_FIRED F
ON C.CMX_TRAN_ID = F.CMX_TRAN_ID

May be it's just the data volume your query returns, which makes that slow. Run a Select count(*) from <your query> to test this theory.

Also: Is there also a Create Table in your actual SQL or are you actually creating output. If the latter then this could certainly take up a bit of time as soon as data volumes grow.

And last but not least:

Make sure that libname option READBUFF (for TDRCORE and FCMCORE) is set to a much higher value than the default.

http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=acreldb&docsetTarget=p0fmg...

Peter_B · Posted 06-24-2018 01:47 AM

Thanks Patrick,

That's very informative, the ID field you mentioned is the PK so it may be the best option.

I'll ask about the READBUFF option & suggest this to IT support.

Thanks again.

Peter_B · Posted 06-24-2018 02:19 AM

I've just tried to run the Select count(*) from table query which is also taking a fair amount of time.

Would adjusting the READBUFF help here?

SASKiwi · Posted 06-24-2018 02:35 AM

READBUFF won't help here. This option is for optimising the data extraction from the database. Since select count(*) is only a single row it will have no impact. Since select count(*) is still slow then you can assume it is the query itself that is running slow, not the reading of the data across a network to your SAS server.

I suggest you post the complete SAS log of one of your slow queries including the actual run, CPU time and table row count notes.

I've always found Oracle tricky to get good performance out of for complex queries. One technique that I've found pretty useful for speeding up queries is by supplying an Oracle hint to parallelise it and by using SQL passthru to get total control. Here is an example of this:

https://communities.sas.com/t5/Base-SAS-Programming/Please-explain-the-purpose-of-statement-Select-p...

Peter_B · Posted 06-24-2018 02:54 AM

Another question for you about your comment

so when you're looking for something instead of having to scan through the whole table, you first go to the index which tells you where the data lives in the table.

I also have the Index field names that don't match any fields, so how do I use the Index filed(s) in my query?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away