Help using Base SAS procedures

Join on or

Reply
Contributor
Posts: 38

Join on or

Hello,

I am having an issue with the join below as it takes for ever to run, any suggestions would be great.

proc sql;

create table Source_Lights as

select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

from Lights a inner join Emp_List b

on (a.employee = b.employee or a.employee = b.legacyemployee)

and a.date = b.date

group by a.date, a.employee;

quit;

Without the or a.employee = b.legacyemployee it works but when this is added it just spolls and still has yet to complete after an hour.

Fred


Super Contributor
Posts: 339

Re: Join on or

Posted in reply to fred_major

I can't think of an explanation as to why the ON statement would be processed different from a where join but have you tried

...

from lights a, emp_list b

where (a.employee=b.employee or a.employee=b.legacyemployee) and a.date=b.date

group by a.date, a.employee;

quit;

Else, you can always try to merge 2 subqueries to kill that or statement

proc sql;

     create table source_light as

     select * from (

          select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

          from Lights a inner join Emp_List b

          on (a.employee = b.employee)

          and a.date = b.date

          group by a.date, a.employee

          )

     union

     select * from     (

     select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

     from Lights a inner join Emp_List b

     on (a.employee = b.legacyemployee)

     and a.date = b.date

     group by a.date, a.employee

          )

;

quit;

Not sure either will really improve the computing speed but until a better answer comes up, it is worth testing.

Vincent

*edit* removed the bad semi columns as depicted by PG

Respected Advisor
Posts: 4,919

Re: Join on or

Posted in reply to Vince28_Statcan

Yes, Vincent's solution should improve things a lot. Furthermore, if no record in Emp_List has both employee AND legacyemployee then replacing UNION by UNION ALL will improve things still further.

Syntax wise, you must remove the semicolons after a.employee, they don't belong there, and you can remove the select * from ( ) wrappers, they are not required.

PG

PG
Contributor
Posts: 38

Re: Join on or

Thanks PG for your help

Fred

Contributor
Posts: 38

Re: Join on or

Posted in reply to Vince28_Statcan

Thank you Vince this worked like a charm after removing the semicolons as PG advised.

Perfect

Fred

Regular Contributor
Posts: 151

Re: Join on or

Posted in reply to fred_major

Does b.employee have a value if b.legacyemployee is populated?  My thinking is that if b.employee is missing in these instances then putting 'a.employee=coalesce(b.employee,b.legacyemployee)' may speed things up a bit.

Ask a Question
Discussion stats
  • 5 replies
  • 205 views
  • 0 likes
  • 4 in conversation