Hello,
I am having an issue with the join below as it takes for ever to run, any suggestions would be great.
proc sql;
create table Source_Lights as
select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid
from Lights a inner join Emp_List b
on (a.employee = b.employee or a.employee = b.legacyemployee)
and a.date = b.date
group by a.date, a.employee;
quit;
Without the or a.employee = b.legacyemployee it works but when this is added it just spolls and still has yet to complete after an hour.
Fred
I can't think of an explanation as to why the ON statement would be processed different from a where join but have you tried
...
from lights a, emp_list b
where (a.employee=b.employee or a.employee=b.legacyemployee) and a.date=b.date
group by a.date, a.employee;
quit;
Else, you can always try to merge 2 subqueries to kill that or statement
proc sql;
create table source_light as
select * from (
select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid
from Lights a inner join Emp_List b
on (a.employee = b.employee)
and a.date = b.date
group by a.date, a.employee
)
union
select * from (
select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid
from Lights a inner join Emp_List b
on (a.employee = b.legacyemployee)
and a.date = b.date
group by a.date, a.employee
)
;
quit;
Not sure either will really improve the computing speed but until a better answer comes up, it is worth testing.
Vincent
*edit* removed the bad semi columns as depicted by PG
Yes, Vincent's solution should improve things a lot. Furthermore, if no record in Emp_List has both employee AND legacyemployee then replacing UNION by UNION ALL will improve things still further.
Syntax wise, you must remove the semicolons after a.employee, they don't belong there, and you can remove the select * from ( ) wrappers, they are not required.
PG
Thanks PG for your help
Fred
Thank you Vince this worked like a charm after removing the semicolons as PG advised.
Perfect
Fred
Does b.employee have a value if b.legacyemployee is populated? My thinking is that if b.employee is missing in these instances then putting 'a.employee=coalesce(b.employee,b.legacyemployee)' may speed things up a bit.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.