12-05-2016 05:19 PM - edited 12-05-2016 05:20 PM
Please, I want to optimise the sas sql query, for the hundred of variables and more than 3 000 000 observations.
when I do like below, but , it will take more than 10 minutes
create table toto as
v1 as w1,
/* many cases conditions of case are used */...
from ( select * from dat1 where y1='O' and f1 ne 'E') as M1 left join beta as M2
left join omega as M3
I think using merge with dataset ? will be more faster ?
12-05-2016 05:54 PM - edited 12-05-2016 10:32 PM
Assuming tables Beta and Omega don't have high volumes, a hash lookup could perform quite well.
set dat1(where=(y1='O' and f1 ne 'E'));
if _n_=1 then
...define hashes for Beta
... all your case statements reformulated as if..then..else or as Select ...When ..
12-05-2016 06:00 PM
You don't give much details about those CASE expressions. First try to remove the subquery like this
.... from dat1 as M1 left join beta as M2 on M1.id=M2.id left join omega as M3 on M2.tr=M3.tr where M1.y1='O' and M1.f1 ne 'E';
and if column tr exists in table dat1, you should use on M1.tr=M3.tr instead of on M2.tr=M3.tr.
12-05-2016 11:31 PM
Question about the subquery removal.
Is there any benefit to using a data set where parameter (as per the program fragment below), as opposed to having the "where m1.y1='0' and m1.f1 NE 'E'" sql clause? In the DATA step world, this can often improve performance by outsourcing the filtering process to the data access engine. Perhaps proc sql knows to outsource when feasible.
.... from dat1 (where=(y1='0'and f1 ^= 'E')) as M1 left join beta as M2 on M1.id=M2.id left join omega as M3 on M2.tr=M3.tr ;
12-06-2016 12:46 AM - edited 12-06-2016 12:47 AM
Using the where clause as a data set option or as a SQL clause should make no difference.
It is a matter of preference, sticking to the standards if this matters, and legibility.
I like the option when possible as it makes the contribution of each data set more obvious.
Data steps sometimes do behave differently depending on whether a clause or an option is supplied, but they shouldn't, and this behaviour is a defect.
12-06-2016 05:27 AM
Sounds to me like your data modelling is a failure to start with:
" for the hundred of variables"
Have never seen a good working system with so many variables, your select will be huge. Doesn't matter how well you write code, if you input data is rubbish then the code will not work well.
12-07-2016 02:45 AM
@ll: Thank you all for your answer.@ll: Thank you all for your answer.
To pricise some points
> clause when in sas sql, the case, I use is like this:
when v1 > 12 and W1=‘O’ then 1
when V2 >= 0 and W2=‘F’ then 2
when vn < 0 or Hn not in (‘1’,‘2’) then .
end as w_n
I use it with left join.
I want to have a real time cpu <1 minutes, now I have 10 minutes.
Thank you again