BookmarkSubscribeRSS Feed
speaky
Calcite | Level 5

Hi,

I'm trying to optimise my proc sql to be as efficient as it can be.

I'm matching a large dataset onto a performance file, but the rows selected with change each month depending on the outcome window.

Proc sql is processed in the following order (I think)

proc sql;

select ...5

from ...1

where ...2

group by ...3

having ...4

order by ...;6

quit;

For efficiency, would it be better to have a where statement in the from line?  Would this get processed first?

Thanks,

C

3 REPLIES 3
art297
Opal | Level 21

I don't think it would make much, if any, difference.  At least it hasn't, consistently, on any of my own tests.

Haikuo
Onyx | Level 15

Not sure I completely understand your question, but here is my 2 cents:

1. 'where' clause is processed even before 'select' clause in term of processing.

2. Sometimes you can't change the writting orders, and in this case, it does not matter anyway.

3. SQL is very efficient if you do it completely natively, say within Oracle by pass-through method.

4. if your datasets are SAS tables, you could explore other options, such as data step merge, or with index, or hash(), it can only be determined on case by case bases.

A good summary paper on 9 ways of joining datasets:

http://www.nesug.org/Proceedings/nesug09/po/po13.pdf

Regards,

Haikuo

speaky
Calcite | Level 5

Thanks for your replies!  :O)

I'm doing a left join onto a performance table and restricting by outcome month, but the performance file will get very large over time. 

My question was around the location of the where clause, at present it's in the where select at the bottom, but I wondered if anyone knew for efficiency if it was better to put this after the table reference in the from statement?

The code below worked using ODBC, but when I run it in enterprise guide it gives a syntax error for the and line - is this because there is more than one variable in the select statement?

At the moment, the datasets are very small, but when we switch portfolio, optimising the code below is key!

proc sql;

  create table cj as

  select A.*, B.*

    from &mthy. a left join perform.&perf_file. b

        on a.cs_id = b.cs_id

       and (cs_id, arrears) IN

  (select cs_id, MAX(arrears) as flag_3

    from perform.&perf_file. b

   where '01DEC2011'D <= B.outcome_month <= '28FEB2012'D

   group by B.cs_id)

   ;QUIT;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 822 views
  • 0 likes
  • 3 in conversation