Hi,
I am new to SAS, so please bear with me, and I am mostly using it for data manipulation. The problem I am trying to solve is updating a variable in a dataset based upon value from another dataset. My datastructure is below:
Dataset1 - Agent_Shift
Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category, Start_moment (timestamp), End_Moment (timestamp)
This data set has the timespans for an agent's Shift, Overtime, and Makeup time. Below, for example Agent1234 is scheduled to work from 01MAR2015:07:00:00 to 01MAR2015:16:30:00 is the schedule span / shift and the same employee has Overtime from 16:30 to 18:00.
Agent_Shift dataset: This dataset would update the Avail_category of Agent_Schedule_Details below
Report_Date | Agent_ID | Shift_Code | Start_Moment | End_Moment | Avail_Category |
1-Mar-15 | 1234 | SHIFT | 01MAR2015:07:00:00.000000 | 01MAR2015:16:30:00.000000 | Schedule Span |
1-Mar-15 | 1234 | OVPH | 01MAR2015:16:30:00.000000 | 01MAR2015:18:00:00.000000 | Gross OT |
Dataset2 - Agent_Schedule_Details
Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category (currently blank), Start_moment (timestamp), End_Moment (timestamp)
This dataset contains the transactional timestamps of an agents workday. For example agent 1234 was scheduled for production work at 5 different times throughout the day (PRD), had 2 breaks at specific time stamps as well as a lunch.
Agent_Schedule_Details dataset (Avail_Category will need to be updated from Avail Category of Agent_Shift)
Report_Date | Agent_ID | Shift_Code | Start_Moment | End_Moment | Avail_Category |
1-Mar-15 | 1234 | CKEY | 01MAR2015:07:00:00.000000 | 01MAR2015:07:08:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | PRD | 01MAR2015:07:08:00.000000 | 01MAR2015:10:30:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | BRK1 | 01MAR2015:10:30:00.000000 | 01MAR2015:10:45:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | PRD | 01MAR2015:10:45:00.000000 | 01MAR2015:14:00:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | LUNCH | 01MAR2015:14:00:00.000000 | 01MAR2015:14:30:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | PRD | 01MAR2015:14:30:00.000000 | 01MAR2015:15:37:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | BRKL | 01MAR2015:15:37:00.000000 | 01MAR2015:15:52:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | PRD | 01MAR2015:15:52:00.000000 | 01MAR2015:16:30:00.000000 | Should be Schedule Span |
1-Mar-15 | 1234 | BRKO | 01MAR2015:16:45:00.000000 | 01MAR2015:17:00:00.000000 | Should be Gross OT |
1-Mar-15 | 1234 | PRD | 01MAR2015:17:00:00.000000 | 01MAR2015:18:00:00.000000 | Should be Gross OT |
The issue I am having is how do I update the Avail category based on the Avail_Category and timestamp of the Agent_Shift dataset? I know there should be logic to see if the start and stop timestamps of each observation within Agent_Schedule_Details falls with the time stamp of the observations for the Agents_Shift. I am just struggling with how to do this update.
If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:
option linesize=120;
data agent_shift;
input Report_Date :anydtdte. Agent_ID Shift_Code $
(Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;
format Report_Date date9. Start_Moment End_Moment datetime15.;
datalines;
1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span
1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT
;
data Agent_Schedule_Details;
input Report_Date :anydtdte. Agent_ID Shift_Code $
(Start_Moment End_Moment) (:datetime.);
format Report_Date date9. Start_Moment End_Moment datetime15.;
datalines;
1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000
1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000
1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000
1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000
1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000
1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000
1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000
1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000
1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000
1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000
;
proc sql;
create table want as
select
l.Report_Date,
l.Agent_ID,
l.Shift_Code,
l.Start_Moment,
min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,
r.Avail_Category
from
Agent_Schedule_Details as l left join
Agent_Shift as r on
l.Agent_ID = r.Agent_ID and
l.Start_Moment >= r.Start_Moment and
l.Start_moment < r.End_Moment
union
select
l.Report_Date,
l.Agent_ID,
l.Shift_Code,
max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,
l.End_Moment,
r.Avail_Category
from
Agent_Schedule_Details as l left join
Agent_Shift as r on
l.Agent_ID = r.Agent_ID and
l.End_Moment > r.Start_Moment and
l.End_moment <= r.End_Moment
order by Agent_ID, Start_Moment
;
select * from want;
quit;
Report_Date Agent_ID Shift_Code Start_Moment End_Moment Avail_Category
-----------------------------------------------------------------------------------
01MAR2015 1234 CKEY 01MAR15:07:00 01MAR15:07:08 Schedule Span
01MAR2015 1234 PRD 01MAR15:07:08 01MAR15:10:30 Schedule Span
01MAR2015 1234 BRK1 01MAR15:10:30 01MAR15:10:45 Schedule Span
01MAR2015 1234 PRD 01MAR15:10:45 01MAR15:14:00 Schedule Span
01MAR2015 1234 LUNCH 01MAR15:14:00 01MAR15:14:30 Schedule Span
01MAR2015 1234 PRD 01MAR15:14:30 01MAR15:15:37 Schedule Span
01MAR2015 1234 BRKL 01MAR15:15:37 01MAR15:15:52 Schedule Span
01MAR2015 1234 PRD 01MAR15:15:52 01MAR15:16:30 Schedule Span
01MAR2015 1234 PRD 01MAR15:16:30 01MAR15:16:40 Gross OT
01MAR2015 1234 BRKO 01MAR15:16:45 01MAR15:17:00 Gross OT
01MAR2015 1234 PRD 01MAR15:17:00 01MAR15:18:00 Gross OT
Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.
PG
proc sql;
create table want as
select l.*, r.Avail_Category
from Agent_Schedule_Details as l left join Agent_Shift as r on (
l.Agent_ID=r.Agent_ID and l.Start_Moment between r.Start_Moment and r.End_Moment
)
;
quit;
The tricky part is in bold. You should precisely define how to match observations.
What if there is an activity for agent 1234 from 16:25 to 16:35?
If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:
option linesize=120;
data agent_shift;
input Report_Date :anydtdte. Agent_ID Shift_Code $
(Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;
format Report_Date date9. Start_Moment End_Moment datetime15.;
datalines;
1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span
1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT
;
data Agent_Schedule_Details;
input Report_Date :anydtdte. Agent_ID Shift_Code $
(Start_Moment End_Moment) (:datetime.);
format Report_Date date9. Start_Moment End_Moment datetime15.;
datalines;
1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000
1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000
1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000
1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000
1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000
1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000
1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000
1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000
1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000
1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000
;
proc sql;
create table want as
select
l.Report_Date,
l.Agent_ID,
l.Shift_Code,
l.Start_Moment,
min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,
r.Avail_Category
from
Agent_Schedule_Details as l left join
Agent_Shift as r on
l.Agent_ID = r.Agent_ID and
l.Start_Moment >= r.Start_Moment and
l.Start_moment < r.End_Moment
union
select
l.Report_Date,
l.Agent_ID,
l.Shift_Code,
max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,
l.End_Moment,
r.Avail_Category
from
Agent_Schedule_Details as l left join
Agent_Shift as r on
l.Agent_ID = r.Agent_ID and
l.End_Moment > r.Start_Moment and
l.End_moment <= r.End_Moment
order by Agent_ID, Start_Moment
;
select * from want;
quit;
Report_Date Agent_ID Shift_Code Start_Moment End_Moment Avail_Category
-----------------------------------------------------------------------------------
01MAR2015 1234 CKEY 01MAR15:07:00 01MAR15:07:08 Schedule Span
01MAR2015 1234 PRD 01MAR15:07:08 01MAR15:10:30 Schedule Span
01MAR2015 1234 BRK1 01MAR15:10:30 01MAR15:10:45 Schedule Span
01MAR2015 1234 PRD 01MAR15:10:45 01MAR15:14:00 Schedule Span
01MAR2015 1234 LUNCH 01MAR15:14:00 01MAR15:14:30 Schedule Span
01MAR2015 1234 PRD 01MAR15:14:30 01MAR15:15:37 Schedule Span
01MAR2015 1234 BRKL 01MAR15:15:37 01MAR15:15:52 Schedule Span
01MAR2015 1234 PRD 01MAR15:15:52 01MAR15:16:30 Schedule Span
01MAR2015 1234 PRD 01MAR15:16:30 01MAR15:16:40 Gross OT
01MAR2015 1234 BRKO 01MAR15:16:45 01MAR15:17:00 Gross OT
01MAR2015 1234 PRD 01MAR15:17:00 01MAR15:18:00 Gross OT
Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.
PG
Thanks, PGStats. That is exactly what I was looking for
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.