DATA Step, Macro, Functions and more

Query optimisation suggestion

Reply
Contributor
Posts: 44

Query optimisation suggestion

Hi All,

i have huge oracle table ,in that one column's value is getting populated as missing and I have been asked to find out how many are getting populated as missing (count) and for which group var. So I am writing following query but its taking a lot of time.(running for more than 3 hrs)

LIBNAME gdw  ORACLE  PATH=P1  SCHEMA=W  USER=R 

PASSWORD="XXXXXX"

;

libname ip   '/gpfs3' ;

%Macro m1 (period);

proc sql outobs=1000;

create table ip._&period. as

select count(status) as cnt,

period_id from gdw.ACCT_HIST;

where status is missing and period_id="&period"

group by period_id;

quit;

%mend;

%m1(201307)

%m1(201306)

%m1(201305)

%m1(201304)

%m1(201303)

%m1(201302)

Please suggest any alternatives to get faster results.

Super User
Posts: 5,432

Re: Query optimisation suggestion

Posted in reply to yashpande

First, it seem you have a ; too much, in the present condition your where and group by doesn't execute.

As for optimization, you need to know a little how this table is set up in Oracle.

Is the table partitioned by period_id? If not, your total run time would probably benefit if executed in one pass (not looping through the different periods).

A part from that, the query is quite straight forward, and if SAS pushes the query to Oracle, then you need to talk to the Oracle DBA about Oracle performance.

To see what is really going "down" specify:

options sastrace=',,,d' sastraceloc=saslog;

Data never sleeps
Super Contributor
Posts: 307

Re: Query optimisation suggestion

Posted in reply to yashpande

Is there a reason you need to output a separate table for each period_id? Especially since each table will consist solely of the period_id and the count of missing obs? Why not create one table with a "group by" based on period_id?

Trusted Advisor
Posts: 3,214

Re: Query optimisation suggestion

Posted in reply to yashpande

The code is error free. The question is about performance.

By happy, the machine is still not smart enough to transform the unstructerd  question in answer on his own.

You have something to do and think about it. 

Some advices:

- Break up your research into smaller parts.

Eg extract the missings to a smaller table to do more research on.  The pre-req is this number and size must fit in your environment. 

The smaller the dataset to work on the easier it becomes.  (Same as fugues)

- Understand the physical effects of your request

  Indeed Check with you ORA DBA (as LinusH) already stated. If there are views (disadvantage), indexes(advantage) you could imagine what the DBMS internally wil do.

- Sequential/Random  Normally invisible as hidden behind the SQL.

When the effect wil be an access to the whole database touching every record. Try to have it done just once and sequential.

When the effect will be accessing more than (rule of thumb) as 10% still do it sequential once, is it less random (indexed) would be faster.

Getting some answers, iterate this as new questions will need to solved.    

---->-- ja karman --<-----
Trusted Advisor
Posts: 3,214

Re: Query optimisation suggestion

Fot the tecnical part on SAS/Access to oracle there are a lot of tuning options.

For simple read retrieve is should commonly be acceptable. Teh buffsize can help to improve, some functions are causing ... (heavy network load) 
http://support.sas.com/resources/papers/proceedings13/081-2013.pdf

---->-- ja karman --<-----
Super Contributor
Posts: 418

Re: Query optimisation suggestion

Posted in reply to yashpande

Just as a genereal question, are you running this on a local sas machine that is not connected to a server? Or are you running it on a server that is in a different storage location from your Oracle server?

If so, it could be your network that is causing a lot of the slowdown. I know because I was running reports on a local machine that was accessing a server half way across the country, and a query that would take ~30 seconds on the server was taking over 45 minutes to go through the network (especially when you're returning a lot of data).

Just another thing to check out

Ask a Question
Discussion stats
  • 5 replies
  • 247 views
  • 0 likes
  • 5 in conversation