BookmarkSubscribeRSS Feed
fred_major
Calcite | Level 5

Hello,

I am having an issue with the join below as it takes for ever to run, any suggestions would be great.

proc sql;

create table Source_Lights as

select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

from Lights a inner join Emp_List b

on (a.employee = b.employee or a.employee = b.legacyemployee)

and a.date = b.date

group by a.date, a.employee;

quit;

Without the or a.employee = b.legacyemployee it works but when this is added it just spolls and still has yet to complete after an hour.

Fred


5 REPLIES 5
Vince28_Statcan
Quartz | Level 8

I can't think of an explanation as to why the ON statement would be processed different from a where join but have you tried

...

from lights a, emp_list b

where (a.employee=b.employee or a.employee=b.legacyemployee) and a.date=b.date

group by a.date, a.employee;

quit;

Else, you can always try to merge 2 subqueries to kill that or statement

proc sql;

     create table source_light as

     select * from (

          select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

          from Lights a inner join Emp_List b

          on (a.employee = b.employee)

          and a.date = b.date

          group by a.date, a.employee

          )

     union

     select * from     (

     select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

     from Lights a inner join Emp_List b

     on (a.employee = b.legacyemployee)

     and a.date = b.date

     group by a.date, a.employee

          )

;

quit;

Not sure either will really improve the computing speed but until a better answer comes up, it is worth testing.

Vincent

*edit* removed the bad semi columns as depicted by PG

PGStats
Opal | Level 21

Yes, Vincent's solution should improve things a lot. Furthermore, if no record in Emp_List has both employee AND legacyemployee then replacing UNION by UNION ALL will improve things still further.

Syntax wise, you must remove the semicolons after a.employee, they don't belong there, and you can remove the select * from ( ) wrappers, they are not required.

PG

PG
fred_major
Calcite | Level 5

Thanks PG for your help

Fred

fred_major
Calcite | Level 5

Thank you Vince this worked like a charm after removing the semicolons as PG advised.

Perfect

Fred

Keith
Obsidian | Level 7

Does b.employee have a value if b.legacyemployee is populated?  My thinking is that if b.employee is missing in these instances then putting 'a.employee=coalesce(b.employee,b.legacyemployee)' may speed things up a bit.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1133 views
  • 0 likes
  • 4 in conversation