Help using Base SAS procedures

Updating dataset based on values of another dataset

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Updating dataset based on values of another dataset

Hi,

I am new to SAS, so please bear with me, and I am mostly using it for data manipulation.  The problem I am trying to solve is updating a variable in a dataset based upon value from another dataset.  My datastructure is below:

Dataset1 - Agent_Shift

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category, Start_moment (timestamp), End_Moment (timestamp)

This data set has the timespans for an agent's Shift, Overtime, and Makeup time.  Below, for example Agent1234 is scheduled to work from 01MAR2015:07:00:00 to 01MAR2015:16:30:00 is the schedule span / shift and the same employee has Overtime from 16:30 to 18:00.


Agent_Shift dataset: This dataset would update the Avail_category of Agent_Schedule_Details below

Report_DateAgent_IDShift_CodeStart_MomentEnd_MomentAvail_Category
1-Mar-151234SHIFT01MAR2015:07:00:00.00000001MAR2015:16:30:00.000000Schedule Span
1-Mar-151234OVPH01MAR2015:16:30:00.00000001MAR2015:18:00:00.000000Gross OT


Dataset2 - Agent_Schedule_Details

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category (currently blank), Start_moment (timestamp), End_Moment (timestamp)

This dataset contains the transactional timestamps of an agents workday.  For example agent 1234 was scheduled for production work at 5 different times throughout the day (PRD), had 2 breaks at specific time stamps as well as a lunch.

Agent_Schedule_Details dataset (Avail_Category will need to be updated from Avail Category of Agent_Shift)

Report_DateAgent_IDShift_CodeStart_MomentEnd_MomentAvail_Category
1-Mar-151234CKEY01MAR2015:07:00:00.00000001MAR2015:07:08:00.000000Should be Schedule Span
1-Mar-151234

PRD

01MAR2015:07:08:00.00000001MAR2015:10:30:00.000000Should be Schedule Span
1-Mar-151234BRK101MAR2015:10:30:00.00000001MAR2015:10:45:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:10:45:00.00000001MAR2015:14:00:00.000000Should be Schedule Span
1-Mar-151234LUNCH01MAR2015:14:00:00.00000001MAR2015:14:30:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:14:30:00.00000001MAR2015:15:37:00.000000Should be Schedule Span
1-Mar-151234BRKL01MAR2015:15:37:00.00000001MAR2015:15:52:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:15:52:00.00000001MAR2015:16:30:00.000000Should be Schedule Span
1-Mar-151234BRKO01MAR2015:16:45:00.00000001MAR2015:17:00:00.000000Should be Gross OT
1-Mar-151234PRD01MAR2015:17:00:00.00000001MAR2015:18:00:00.000000Should be Gross OT

The issue I am having is how do I update the Avail category based on the Avail_Category and timestamp of the Agent_Shift dataset?  I know there should be logic to see if the start and stop timestamps of each observation within Agent_Schedule_Details falls with the time stamp of the observations for the Agents_Shift.  I am just struggling with how to do this update.


Accepted Solutions
Solution
‎05-05-2015 11:19 PM
Respected Advisor
Posts: 4,649

Re: Updating dataset based on values of another dataset

If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:

option linesize=120;

data agent_shift;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span

1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT

;

data Agent_Schedule_Details;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.);

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000

1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000

1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000

1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000

1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000

1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000

1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000

1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000

1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000

1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000

;

proc sql;

create table want as

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    l.Start_Moment,

    min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.Start_Moment >= r.Start_Moment and

        l.Start_moment < r.End_Moment

union

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,

    l.End_Moment,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.End_Moment > r.Start_Moment and

        l.End_moment <= r.End_Moment

order by Agent_ID, Start_Moment

;

select * from want;

quit;

   Report_Date  Agent_ID  Shift_Code     Start_Moment       End_Moment  Avail_Category

   -----------------------------------------------------------------------------------

     01MAR2015      1234  CKEY          01MAR15:07:00    01MAR15:07:08  Schedule Span

     01MAR2015      1234  PRD           01MAR15:07:08    01MAR15:10:30  Schedule Span

     01MAR2015      1234  BRK1          01MAR15:10:30    01MAR15:10:45  Schedule Span

     01MAR2015      1234  PRD           01MAR15:10:45    01MAR15:14:00  Schedule Span

     01MAR2015      1234  LUNCH         01MAR15:14:00    01MAR15:14:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:14:30    01MAR15:15:37  Schedule Span

     01MAR2015      1234  BRKL          01MAR15:15:37    01MAR15:15:52  Schedule Span

     01MAR2015      1234  PRD           01MAR15:15:52    01MAR15:16:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:16:30    01MAR15:16:40  Gross OT

     01MAR2015      1234  BRKO          01MAR15:16:45    01MAR15:17:00  Gross OT

     01MAR2015      1234  PRD           01MAR15:17:00    01MAR15:18:00  Gross OT

Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.

PG

PG

View solution in original post


All Replies
SAS Employee
Posts: 340

Re: Updating dataset based on values of another dataset

proc sql;

     create table want as

     select l.*, r.Avail_Category

     from Agent_Schedule_Details as l left join Agent_Shift as r on (

          l.Agent_ID=r.Agent_ID and l.Start_Moment between r.Start_Moment and r.End_Moment

     )

;

quit;

The tricky part is in bold. You should precisely define how to match observations.

What if there is an activity for  agent 1234 from 16:25 to 16:35?

Solution
‎05-05-2015 11:19 PM
Respected Advisor
Posts: 4,649

Re: Updating dataset based on values of another dataset

If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:

option linesize=120;

data agent_shift;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span

1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT

;

data Agent_Schedule_Details;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.);

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000

1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000

1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000

1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000

1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000

1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000

1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000

1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000

1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000

1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000

;

proc sql;

create table want as

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    l.Start_Moment,

    min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.Start_Moment >= r.Start_Moment and

        l.Start_moment < r.End_Moment

union

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,

    l.End_Moment,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.End_Moment > r.Start_Moment and

        l.End_moment <= r.End_Moment

order by Agent_ID, Start_Moment

;

select * from want;

quit;

   Report_Date  Agent_ID  Shift_Code     Start_Moment       End_Moment  Avail_Category

   -----------------------------------------------------------------------------------

     01MAR2015      1234  CKEY          01MAR15:07:00    01MAR15:07:08  Schedule Span

     01MAR2015      1234  PRD           01MAR15:07:08    01MAR15:10:30  Schedule Span

     01MAR2015      1234  BRK1          01MAR15:10:30    01MAR15:10:45  Schedule Span

     01MAR2015      1234  PRD           01MAR15:10:45    01MAR15:14:00  Schedule Span

     01MAR2015      1234  LUNCH         01MAR15:14:00    01MAR15:14:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:14:30    01MAR15:15:37  Schedule Span

     01MAR2015      1234  BRKL          01MAR15:15:37    01MAR15:15:52  Schedule Span

     01MAR2015      1234  PRD           01MAR15:15:52    01MAR15:16:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:16:30    01MAR15:16:40  Gross OT

     01MAR2015      1234  BRKO          01MAR15:16:45    01MAR15:17:00  Gross OT

     01MAR2015      1234  PRD           01MAR15:17:00    01MAR15:18:00  Gross OT

Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.

PG

PG
New Contributor
Posts: 2

Re: Updating dataset based on values of another dataset

Thanks, PGStats.  That is exactly what I was looking for

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 328 views
  • 4 likes
  • 3 in conversation