BookmarkSubscribeRSS Feed
samuelallu
Calcite | Level 5

Hi Team,

Can you please provide your suggestions on how to optimize the below query:

proc sql;

create table low_level_min_dates as

select DISTINCT mca.sort_order_id, mca.sps_mca_id,sps_bca_id,sps_card_id,sps_industry_id,sps_merchant_id, sps_product_id, min(sps_time_id) as sps_time_id

from lup_mca_match mca inner join fact.fact_billings_month bil on mca.sps_mca_id = bil.sps_mca_id AND bil.sps_billings_type_id = 1 AND bil.bill_bud_usd_am ne .

group by sort_order_id, mca.sps_mca_id, sps_bca_id, sps_card_id, sps_industry_id, sps_merchant_id,sps_product_id

order by sort_order_id, sps_time_id ;

quit;

NOTE: Table WORK.LOW_LEVEL_MIN_DATES created, with 48812804 rows and 8 columns.

NOTE: PROCEDURE SQL used (Total process time):

      real time           1:58:14.00

      cpu time            21:58.92

Table fact.fact_billings_month has 200 million records and there are indexes on all the variables and table lup_mca_match has 6000 records and my fact is using a SPDE engine.

So can you please suggest me in the different ways the above query can be written in reducing the real time. 

Regards,

Samuel.

4 REPLIES 4
Resa
Pyrite | Level 9

I do not have a direct solution but for such cases I often use the "_method" option as described in TS-DOC 533.

This document also provides additional information and tips on how to optimize an SQL query.

Besides "_method" I furthermore use the INOBS option to (initially) limit the number of rows that are processed to evaluate the result of my optimization.

I hope this hint helps you to optimize your query.

samuelallu
Calcite | Level 5

Thanks for ur inputs i have already used _Method which is using the index algorithm which is right and do u have any idea on implementing the same query in any other alternate method.. Once again thanks for your inputs.

LinusH
Tourmaline | Level 20

My experience is that it can be hard to affect the SQL planner.

And I don't you can rewrite the query to boost performance, it is already quite straightforward.

As mentioned earlier, you could probably the ORDER BY since you are using DISTINCT (which is on the other hand unnecessary since you are using GROUP BY).

Probably won't help, but move your subsets to a WHERE clause instead.

Sometimes index joins can be quite I/O intense, which could explain the quite big difference between CPU and Real Time.

Set OPTIONS MSGLEVEL=I FULLSTIMER, they will letr you know in more detail how you query affects the system.

/Linus

Data never sleeps
Ksharp
Super User

Your code is good.If you can ,remove 'order by' clause, because sql will sort variables automatically.

And if you can try to make some index for your inner join 's variables ,which can reduce the time a lot.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 807 views
  • 0 likes
  • 4 in conversation