BookmarkSubscribeRSS Feed
LineMoon
Lapis Lazuli | Level 10

Hello experts,

Please, I want to optimise the sas sql query, for the hundred of variables and more than 3 000 000 observations.

when I do like below, but , it will take more than 10 minutes

 

proc sql;

 create table toto as

select

v1 as w1,

v2,

case

...when

...

else

as v3,

 

/* many cases conditions of case are used */...

...

..,

vn

from ( select * from dat1 where y1='O' and f1 ne 'E') as M1 left join beta as M2

on M1.id=M2.id

left join omega as M3

on M2.tr=M3.tr;

quit;

I think using merge with dataset ? will be more faster ?

 

10 REPLIES 10
Patrick
Opal | Level 21

Assuming tables Beta and Omega don't have high volumes, a hash lookup could perform quite well.

 

data toto;
set dat1(where=(y1='O' and f1 ne 'E'));

if _n_=1 then

do;

...define hashes for Beta

end;

 

...hash lookups

 

... all your case statements reformulated as if..then..else or as Select ...When ..


run;

PGStats
Opal | Level 21

You don't give much details about those CASE expressions. First try to remove the subquery like this

 

....
from 
	dat1 as M1 left join 
	beta as M2 on M1.id=M2.id left join 
	omega as M3 on M2.tr=M3.tr
where M1.y1='O' and M1.f1 ne 'E';

and if column tr exists in table dat1, you should use on M1.tr=M3.tr instead of on M2.tr=M3.tr.

PG
mkeintz
PROC Star

Question about the subquery removal.

 

Is there any benefit to using a data set where parameter (as per the program fragment below), as opposed to having the "where m1.y1='0' and m1.f1 NE 'E'" sql clause?  In the DATA step world, this can often improve performance by outsourcing the filtering process to the data access engine. Perhaps proc sql knows to outsource when feasible.

 

  ....
from 
	dat1 (where=(y1='0'and f1 ^= 'E')) as M1 left join 
	beta as M2 on M1.id=M2.id left join 
	omega as M3 on M2.tr=M3.tr
    ;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Using the where clause as a data set option or as a SQL clause should make no difference.

It is a matter of preference, sticking to the standards if this matters, and legibility.

I like the option when possible as it makes the contribution of each data set more obvious.

 

Data steps sometimes do behave differently depending on whether a clause or an option is supplied, but they shouldn't, and this behaviour is a defect.

 

 

 

 

Reeza
Super User

Depending on the CASE statements, a format could be helpful.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sounds to me like your data modelling is a failure to start with:
" for the hundred of variables"

Have never seen a good working system with so many variables, your select will be huge.  Doesn't matter how well you write code, if you input data is rubbish then the code will not work well.

LineMoon
Lapis Lazuli | Level 10

@ll: Thank you all for your answer.@ll: Thank you all for your answer.

 

To pricise some points

 

> clause when in sas sql, the case, I use is like this:

 

case
               when v1 > 12 and W1=‘O’ then 1
               when V2 >= 0 and W2=‘F’ then 2

.....
               when vn < 0 or Hn not in (‘1’,‘2’) then .
               else 100
           end as w_n

I use it with left join.

I want to have a real time cpu <1 minutes, now I have 10 minutes.

Thank you again

 

ChrisNZ
Tourmaline | Level 20
What is the bottleneck? Please post the complete fullstimer output from the log.
LineMoon
Lapis Lazuli | Level 10

@ChrisNZ: Thank you for your answer.

as I said before it is about 10 minutes

ChrisNZ
Tourmaline | Level 20

Feel free to not supply the information requested and not get the help you want.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1096 views
  • 0 likes
  • 7 in conversation