Solved: Updating dataset based on values of another dataset

ksbarnz · Posted 05-05-2015 04:49 PM

Hi,

I am new to SAS, so please bear with me, and I am mostly using it for data manipulation. The problem I am trying to solve is updating a variable in a dataset based upon value from another dataset. My datastructure is below:

Dataset1 - Agent_Shift

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category, Start_moment (timestamp), End_Moment (timestamp)

This data set has the timespans for an agent's Shift, Overtime, and Makeup time. Below, for example Agent1234 is scheduled to work from 01MAR2015:07:00:00 to 01MAR2015:16:30:00 is the schedule span / shift and the same employee has Overtime from 16:30 to 18:00.

Agent_Shift dataset: This dataset would update the Avail_category of Agent_Schedule_Details below

Report_Date	Agent_ID	Shift_Code	Start_Moment	End_Moment	Avail_Category
1-Mar-15	1234	SHIFT	01MAR2015:07:00:00.000000	01MAR2015:16:30:00.000000	Schedule Span
1-Mar-15	1234	OVPH	01MAR2015:16:30:00.000000	01MAR2015:18:00:00.000000	Gross OT

Dataset2 - Agent_Schedule_Details

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category (currently blank), Start_moment (timestamp), End_Moment (timestamp)

This dataset contains the transactional timestamps of an agents workday. For example agent 1234 was scheduled for production work at 5 different times throughout the day (PRD), had 2 breaks at specific time stamps as well as a lunch.

Agent_Schedule_Details dataset (Avail_Category will need to be updated from Avail Category of Agent_Shift)

Report_Date	Agent_ID	Shift_Code	Start_Moment	End_Moment	Avail_Category
1-Mar-15	1234	CKEY	01MAR2015:07:00:00.000000	01MAR2015:07:08:00.000000	Should be Schedule Span
1-Mar-15	1234	PRD	01MAR2015:07:08:00.000000	01MAR2015:10:30:00.000000	Should be Schedule Span
1-Mar-15	1234	BRK1	01MAR2015:10:30:00.000000	01MAR2015:10:45:00.000000	Should be Schedule Span
1-Mar-15	1234	PRD	01MAR2015:10:45:00.000000	01MAR2015:14:00:00.000000	Should be Schedule Span
1-Mar-15	1234	LUNCH	01MAR2015:14:00:00.000000	01MAR2015:14:30:00.000000	Should be Schedule Span
1-Mar-15	1234	PRD	01MAR2015:14:30:00.000000	01MAR2015:15:37:00.000000	Should be Schedule Span
1-Mar-15	1234	BRKL	01MAR2015:15:37:00.000000	01MAR2015:15:52:00.000000	Should be Schedule Span
1-Mar-15	1234	PRD	01MAR2015:15:52:00.000000	01MAR2015:16:30:00.000000	Should be Schedule Span
1-Mar-15	1234	BRKO	01MAR2015:16:45:00.000000	01MAR2015:17:00:00.000000	Should be Gross OT
1-Mar-15	1234	PRD	01MAR2015:17:00:00.000000	01MAR2015:18:00:00.000000	Should be Gross OT

The issue I am having is how do I update the Avail category based on the Avail_Category and timestamp of the Agent_Shift dataset? I know there should be logic to see if the start and stop timestamps of each observation within Agent_Schedule_Details falls with the time stamp of the observations for the Agents_Shift. I am just struggling with how to do this update.

PGStats · Posted 05-05-2015 11:19 PM

If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:

option linesize=120;

data agent_shift;

input Report_Date :anydtdte. Agent_ID Shift_Code $

(Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span

1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT

;

data Agent_Schedule_Details;

input Report_Date :anydtdte. Agent_ID Shift_Code $

(Start_Moment End_Moment) (:datetime.);

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000

1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000

1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000

1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000

1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000

1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000

1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000

1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000

1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000

1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000

;

proc sql;

create table want as

select

l.Report_Date,

l.Agent_ID,

l.Shift_Code,

l.Start_Moment,

min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,

r.Avail_Category

from

Agent_Schedule_Details as l left join

Agent_Shift as r on

l.Agent_ID = r.Agent_ID and

l.Start_Moment >= r.Start_Moment and

l.Start_moment < r.End_Moment

union

select

l.Report_Date,

l.Agent_ID,

l.Shift_Code,

max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,

l.End_Moment,

r.Avail_Category

from

Agent_Schedule_Details as l left join

Agent_Shift as r on

l.Agent_ID = r.Agent_ID and

l.End_Moment > r.Start_Moment and

l.End_moment <= r.End_Moment

order by Agent_ID, Start_Moment

;

select * from want;

quit;

Report_Date Agent_ID Shift_Code Start_Moment End_Moment Avail_Category

-----------------------------------------------------------------------------------

01MAR2015 1234 CKEY 01MAR15:07:00 01MAR15:07:08 Schedule Span

01MAR2015 1234 PRD 01MAR15:07:08 01MAR15:10:30 Schedule Span

01MAR2015 1234 BRK1 01MAR15:10:30 01MAR15:10:45 Schedule Span

01MAR2015 1234 PRD 01MAR15:10:45 01MAR15:14:00 Schedule Span

01MAR2015 1234 LUNCH 01MAR15:14:00 01MAR15:14:30 Schedule Span

01MAR2015 1234 PRD 01MAR15:14:30 01MAR15:15:37 Schedule Span

01MAR2015 1234 BRKL 01MAR15:15:37 01MAR15:15:52 Schedule Span

01MAR2015 1234 PRD 01MAR15:15:52 01MAR15:16:30 Schedule Span

01MAR2015 1234 PRD 01MAR15:16:30 01MAR15:16:40 Gross OT

01MAR2015 1234 BRKO 01MAR15:16:45 01MAR15:17:00 Gross OT

01MAR2015 1234 PRD 01MAR15:17:00 01MAR15:18:00 Gross OT

Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.

PG

View solution in original post

gergely_batho · Posted 05-05-2015 05:25 PM

proc sql;

create table want as

select l.*, r.Avail_Category

from Agent_Schedule_Details as l left join Agent_Shift as r on (

l.Agent_ID=r.Agent_ID and l.Start_Moment between r.Start_Moment and r.End_Moment

)

;

quit;

The tricky part is in bold. You should precisely define how to match observations.

What if there is an activity for agent 1234 from 16:25 to 16:35?

PGStats · Posted 05-05-2015 11:19 PM