BookmarkSubscribeRSS Feed
fred_major
Calcite | Level 5

Hello,

I am having an issue with the join below as it takes for ever to run, any suggestions would be great.

proc sql;

create table Source_Lights as

select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

from Lights a inner join Emp_List b

on (a.employee = b.employee or a.employee = b.legacyemployee)

and a.date = b.date

group by a.date, a.employee;

quit;

Without the or a.employee = b.legacyemployee it works but when this is added it just spolls and still has yet to complete after an hour.

Fred


5 REPLIES 5
Vince28_Statcan
Quartz | Level 8

I can't think of an explanation as to why the ON statement would be processed different from a where join but have you tried

...

from lights a, emp_list b

where (a.employee=b.employee or a.employee=b.legacyemployee) and a.date=b.date

group by a.date, a.employee;

quit;

Else, you can always try to merge 2 subqueries to kill that or statement

proc sql;

     create table source_light as

     select * from (

          select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

          from Lights a inner join Emp_List b

          on (a.employee = b.employee)

          and a.date = b.date

          group by a.date, a.employee

          )

     union

     select * from     (

     select datepart(a.date)format Date9. as date,a.employee,a.count(a.timestamp) as Totals, sum(a.Valid) as Valid

     from Lights a inner join Emp_List b

     on (a.employee = b.legacyemployee)

     and a.date = b.date

     group by a.date, a.employee

          )

;

quit;

Not sure either will really improve the computing speed but until a better answer comes up, it is worth testing.

Vincent

*edit* removed the bad semi columns as depicted by PG

PGStats
Opal | Level 21

Yes, Vincent's solution should improve things a lot. Furthermore, if no record in Emp_List has both employee AND legacyemployee then replacing UNION by UNION ALL will improve things still further.

Syntax wise, you must remove the semicolons after a.employee, they don't belong there, and you can remove the select * from ( ) wrappers, they are not required.

PG

PG
fred_major
Calcite | Level 5

Thanks PG for your help

Fred

fred_major
Calcite | Level 5

Thank you Vince this worked like a charm after removing the semicolons as PG advised.

Perfect

Fred

Keith
Obsidian | Level 7

Does b.employee have a value if b.legacyemployee is populated?  My thinking is that if b.employee is missing in these instances then putting 'a.employee=coalesce(b.employee,b.legacyemployee)' may speed things up a bit.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 804 views
  • 0 likes
  • 4 in conversation