Optimization of the below query

samuelallu · Posted 07-28-2011 07:46 AM

Hi Team,

Can you please provide your suggestions on how to optimize the below query:

proc sql;

create table low_level_min_dates as

select DISTINCT mca.sort_order_id, mca.sps_mca_id,sps_bca_id,sps_card_id,sps_industry_id,sps_merchant_id, sps_product_id, min(sps_time_id) as sps_time_id

from lup_mca_match mca inner join fact.fact_billings_month bil on mca.sps_mca_id = bil.sps_mca_id AND bil.sps_billings_type_id = 1 AND bil.bill_bud_usd_am ne .

group by sort_order_id, mca.sps_mca_id, sps_bca_id, sps_card_id, sps_industry_id, sps_merchant_id,sps_product_id

order by sort_order_id, sps_time_id ;

quit;

NOTE: Table WORK.LOW_LEVEL_MIN_DATES created, with 48812804 rows and 8 columns.

NOTE: PROCEDURE SQL used (Total process time):

real time 1:58:14.00

cpu time 21:58.92

Table fact.fact_billings_month has 200 million records and there are indexes on all the variables and table lup_mca_match has 6000 records and my fact is using a SPDE engine.

So can you please suggest me in the different ways the above query can be written in reducing the real time.

Regards,

Samuel.

Resa · Posted 07-28-2011 09:09 AM

I do not have a direct solution but for such cases I often use the "_method" option as described in TS-DOC 533.

This document also provides additional information and tips on how to optimize an SQL query.

Besides "_method" I furthermore use the INOBS option to (initially) limit the number of rows that are processed to evaluate the result of my optimization.

I hope this hint helps you to optimize your query.

samuelallu · Posted 07-28-2011 09:24 AM

Thanks for ur inputs i have already used _Method which is using the index algorithm which is right and do u have any idea on implementing the same query in any other alternate method.. Once again thanks for your inputs.

LinusH · Posted 08-03-2011 03:41 AM

My experience is that it can be hard to affect the SQL planner.

And I don't you can rewrite the query to boost performance, it is already quite straightforward.

As mentioned earlier, you could probably the ORDER BY since you are using DISTINCT (which is on the other hand unnecessary since you are using GROUP BY).

Probably won't help, but move your subsets to a WHERE clause instead.

Sometimes index joins can be quite I/O intense, which could explain the quite big difference between CPU and Real Time.

Set OPTIONS MSGLEVEL=I FULLSTIMER, they will letr you know in more detail how you query affects the system.

/Linus

Data never sleeps

Ksharp · Posted 08-03-2011 01:28 AM

Your code is good.If you can ,remove 'order by' clause, because sql will sort variables automatically.

And if you can try to make some index for your inner join 's variables ,which can reduce the time a lot.

Ksharp