Optimise the sas sql query

LineMoon · Posted 12-05-2016 05:19 PM

Hello experts,

Please, I want to optimise the sas sql query, for the hundred of variables and more than 3 000 000 observations.

when I do like below, but , it will take more than 10 minutes

proc sql;

create table toto as

select

v1 as w1,

v2,

case

...when

...

else

as v3,

/* many cases conditions of case are used */...

...

..,

vn

from ( select * from dat1 where y1='O' and f1 ne 'E') as M1 left join beta as M2

on M1.id=M2.id

left join omega as M3

on M2.tr=M3.tr;

quit;

I think using merge with dataset ? will be more faster ?

Patrick · Posted 12-05-2016 05:54 PM

Assuming tables Beta and Omega don't have high volumes, a hash lookup could perform quite well.

data toto;
set dat1(where=(y1='O' and f1 ne 'E'));

if _n_=1 then

do;

...define hashes for Beta

end;

...hash lookups

... all your case statements reformulated as if..then..else or as Select ...When ..

run;

PGStats · Posted 12-05-2016 06:00 PM

You don't give much details about those CASE expressions. First try to remove the subquery like this

....
from 
	dat1 as M1 left join 
	beta as M2 on M1.id=M2.id left join 
	omega as M3 on M2.tr=M3.tr
where M1.y1='O' and M1.f1 ne 'E';

and if column tr exists in table dat1, you should use on M1.tr=M3.tr instead of on M2.tr=M3.tr.

PG

mkeintz · Posted 12-05-2016 11:31 PM

Question about the subquery removal.

Is there any benefit to using a data set where parameter (as per the program fragment below), as opposed to having the "where m1.y1='0' and m1.f1 NE 'E'" sql clause? In the DATA step world, this can often improve performance by outsourcing the filtering process to the data access engine. Perhaps proc sql knows to outsource when feasible.

  ....
from 
	dat1 (where=(y1='0'and f1 ^= 'E')) as M1 left join 
	beta as M2 on M1.id=M2.id left join 
	omega as M3 on M2.tr=M3.tr
    ;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 12-06-2016 12:46 AM

Using the where clause as a data set option or as a SQL clause should make no difference.

It is a matter of preference, sticking to the standards if this matters, and legibility.

I like the option when possible as it makes the contribution of each data set more obvious.

Data steps sometimes do behave differently depending on whether a clause or an option is supplied, but they shouldn't, and this behaviour is a defect.

High-Performance SAS Coding - Third Edition

Reeza · Posted 12-05-2016 06:24 PM

Depending on the CASE statements, a format could be helpful.

RW9 · Posted 12-06-2016 05:27 AM

Sounds to me like your data modelling is a failure to start with:
" for the hundred of variables"

Have never seen a good working system with so many variables, your select will be huge. Doesn't matter how well you write code, if you input data is rubbish then the code will not work well.

LineMoon · Posted 12-07-2016 02:45 AM

@ll: Thank you all for your answer.@ll: Thank you all for your answer.

To pricise some points

> clause when in sas sql, the case, I use is like this:

case
when v1 > 12 and W1=‘O’ then 1
when V2 >= 0 and W2=‘F’ then 2

.....
               when vn < 0 or Hn not in (‘1’,‘2’) then .
               else 100
           end as w_n

I use it with left join.

I want to have a real time cpu <1 minutes, now I have 10 minutes.

Thank you again

ChrisNZ · Posted 12-07-2016 04:31 AM

What is the bottleneck? Please post the complete fullstimer output from the log.

High-Performance SAS Coding - Third Edition

LineMoon · Posted 12-14-2016 03:54 PM

@ChrisNZ: Thank you for your answer.

as I said before it is about 10 minutes

ChrisNZ · Posted 12-14-2016 05:03 PM

Feel free to not supply the information requested and not get the help you want.

High-Performance SAS Coding - Third Edition

Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Re: Optimise the sas sql query

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away