Help using Base SAS procedures

proc sql optimisation

Reply
New Contributor
Posts: 2

proc sql optimisation

Hi,

I'm trying to optimise my proc sql to be as efficient as it can be.

I'm matching a large dataset onto a performance file, but the rows selected with change each month depending on the outcome window.

Proc sql is processed in the following order (I think)

proc sql;

select ...5

from ...1

where ...2

group by ...3

having ...4

order by ...;6

quit;

For efficiency, would it be better to have a where statement in the from line?  Would this get processed first?

Thanks,

C

PROC Star
Posts: 7,480

proc sql optimisation

I don't think it would make much, if any, difference.  At least it hasn't, consistently, on any of my own tests.

Respected Advisor
Posts: 3,156

Re: proc sql optimisation

Not sure I completely understand your question, but here is my 2 cents:

1. 'where' clause is processed even before 'select' clause in term of processing.

2. Sometimes you can't change the writting orders, and in this case, it does not matter anyway.

3. SQL is very efficient if you do it completely natively, say within Oracle by pass-through method.

4. if your datasets are SAS tables, you could explore other options, such as data step merge, or with index, or hash(), it can only be determined on case by case bases.

A good summary paper on 9 ways of joining datasets:

http://www.nesug.org/Proceedings/nesug09/po/po13.pdf

Regards,

Haikuo

New Contributor
Posts: 2

proc sql optimisation

Thanks for your replies!  Smiley Surprised)

I'm doing a left join onto a performance table and restricting by outcome month, but the performance file will get very large over time. 

My question was around the location of the where clause, at present it's in the where select at the bottom, but I wondered if anyone knew for efficiency if it was better to put this after the table reference in the from statement?

The code below worked using ODBC, but when I run it in enterprise guide it gives a syntax error for the and line - is this because there is more than one variable in the select statement?

At the moment, the datasets are very small, but when we switch portfolio, optimising the code below is key!

proc sql;

  create table cj as

  select A.*, B.*

    from &mthy. a left join perform.&perf_file. b

        on a.cs_id = b.cs_id

       and (cs_id, arrears) IN

  (select cs_id, MAX(arrears) as flag_3

    from perform.&perf_file. b

   where '01DEC2011'D <= B.outcome_month <= '28FEB2012'D

   group by B.cs_id)

   ;QUIT;

Ask a Question
Discussion stats
  • 3 replies
  • 260 views
  • 0 likes
  • 3 in conversation