Hi Team,
Can you please provide your suggestions on how to optimize the below query:
proc sql;
create table low_level_min_dates as
select DISTINCT mca.sort_order_id, mca.sps_mca_id,sps_bca_id,sps_card_id,sps_industry_id,sps_merchant_id, sps_product_id, min(sps_time_id) as sps_time_id
from lup_mca_match mca inner join fact.fact_billings_month bil on mca.sps_mca_id = bil.sps_mca_id AND bil.sps_billings_type_id = 1 AND bil.bill_bud_usd_am ne .
group by sort_order_id, mca.sps_mca_id, sps_bca_id, sps_card_id, sps_industry_id, sps_merchant_id,sps_product_id
order by sort_order_id, sps_time_id ;
quit;
NOTE: Table WORK.LOW_LEVEL_MIN_DATES created, with 48812804 rows and 8 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 1:58:14.00
cpu time 21:58.92
Table fact.fact_billings_month has 200 million records and there are indexes on all the variables and table lup_mca_match has 6000 records and my fact is using a SPDE engine.
So can you please suggest me in the different ways the above query can be written in reducing the real time.
Regards,
Samuel.
I do not have a direct solution but for such cases I often use the "_method" option as described in TS-DOC 533.
This document also provides additional information and tips on how to optimize an SQL query.
Besides "_method" I furthermore use the INOBS option to (initially) limit the number of rows that are processed to evaluate the result of my optimization.
I hope this hint helps you to optimize your query.
Thanks for ur inputs i have already used _Method which is using the index algorithm which is right and do u have any idea on implementing the same query in any other alternate method.. Once again thanks for your inputs.
My experience is that it can be hard to affect the SQL planner.
And I don't you can rewrite the query to boost performance, it is already quite straightforward.
As mentioned earlier, you could probably the ORDER BY since you are using DISTINCT (which is on the other hand unnecessary since you are using GROUP BY).
Probably won't help, but move your subsets to a WHERE clause instead.
Sometimes index joins can be quite I/O intense, which could explain the quite big difference between CPU and Real Time.
Set OPTIONS MSGLEVEL=I FULLSTIMER, they will letr you know in more detail how you query affects the system.
/Linus
Your code is good.If you can ,remove 'order by' clause, because sql will sort variables automatically.
And if you can try to make some index for your inner join 's variables ,which can reduce the time a lot.
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.