Help using Base SAS procedures

Updating dataset based on values of another dataset

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Updating dataset based on values of another dataset

Hi,

I am new to SAS, so please bear with me, and I am mostly using it for data manipulation.  The problem I am trying to solve is updating a variable in a dataset based upon value from another dataset.  My datastructure is below:

Dataset1 - Agent_Shift

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category, Start_moment (timestamp), End_Moment (timestamp)

This data set has the timespans for an agent's Shift, Overtime, and Makeup time.  Below, for example Agent1234 is scheduled to work from 01MAR2015:07:00:00 to 01MAR2015:16:30:00 is the schedule span / shift and the same employee has Overtime from 16:30 to 18:00.


Agent_Shift dataset: This dataset would update the Avail_category of Agent_Schedule_Details below

Report_DateAgent_IDShift_CodeStart_MomentEnd_MomentAvail_Category
1-Mar-151234SHIFT01MAR2015:07:00:00.00000001MAR2015:16:30:00.000000Schedule Span
1-Mar-151234OVPH01MAR2015:16:30:00.00000001MAR2015:18:00:00.000000Gross OT


Dataset2 - Agent_Schedule_Details

Variables: Report_Date, Employee_ID, Shift_Code, Avail_Category (currently blank), Start_moment (timestamp), End_Moment (timestamp)

This dataset contains the transactional timestamps of an agents workday.  For example agent 1234 was scheduled for production work at 5 different times throughout the day (PRD), had 2 breaks at specific time stamps as well as a lunch.

Agent_Schedule_Details dataset (Avail_Category will need to be updated from Avail Category of Agent_Shift)

Report_DateAgent_IDShift_CodeStart_MomentEnd_MomentAvail_Category
1-Mar-151234CKEY01MAR2015:07:00:00.00000001MAR2015:07:08:00.000000Should be Schedule Span
1-Mar-151234

PRD

01MAR2015:07:08:00.00000001MAR2015:10:30:00.000000Should be Schedule Span
1-Mar-151234BRK101MAR2015:10:30:00.00000001MAR2015:10:45:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:10:45:00.00000001MAR2015:14:00:00.000000Should be Schedule Span
1-Mar-151234LUNCH01MAR2015:14:00:00.00000001MAR2015:14:30:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:14:30:00.00000001MAR2015:15:37:00.000000Should be Schedule Span
1-Mar-151234BRKL01MAR2015:15:37:00.00000001MAR2015:15:52:00.000000Should be Schedule Span
1-Mar-151234PRD01MAR2015:15:52:00.00000001MAR2015:16:30:00.000000Should be Schedule Span
1-Mar-151234BRKO01MAR2015:16:45:00.00000001MAR2015:17:00:00.000000Should be Gross OT
1-Mar-151234PRD01MAR2015:17:00:00.00000001MAR2015:18:00:00.000000Should be Gross OT

The issue I am having is how do I update the Avail category based on the Avail_Category and timestamp of the Agent_Shift dataset?  I know there should be logic to see if the start and stop timestamps of each observation within Agent_Schedule_Details falls with the time stamp of the observations for the Agents_Shift.  I am just struggling with how to do this update.


Accepted Solutions
Solution
‎05-05-2015 11:19 PM
Respected Advisor
Posts: 4,935

Re: Updating dataset based on values of another dataset

Posted in reply to gergely_batho

If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:

option linesize=120;

data agent_shift;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span

1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT

;

data Agent_Schedule_Details;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.);

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000

1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000

1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000

1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000

1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000

1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000

1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000

1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000

1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000

1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000

;

proc sql;

create table want as

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    l.Start_Moment,

    min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.Start_Moment >= r.Start_Moment and

        l.Start_moment < r.End_Moment

union

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,

    l.End_Moment,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.End_Moment > r.Start_Moment and

        l.End_moment <= r.End_Moment

order by Agent_ID, Start_Moment

;

select * from want;

quit;

   Report_Date  Agent_ID  Shift_Code     Start_Moment       End_Moment  Avail_Category

   -----------------------------------------------------------------------------------

     01MAR2015      1234  CKEY          01MAR15:07:00    01MAR15:07:08  Schedule Span

     01MAR2015      1234  PRD           01MAR15:07:08    01MAR15:10:30  Schedule Span

     01MAR2015      1234  BRK1          01MAR15:10:30    01MAR15:10:45  Schedule Span

     01MAR2015      1234  PRD           01MAR15:10:45    01MAR15:14:00  Schedule Span

     01MAR2015      1234  LUNCH         01MAR15:14:00    01MAR15:14:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:14:30    01MAR15:15:37  Schedule Span

     01MAR2015      1234  BRKL          01MAR15:15:37    01MAR15:15:52  Schedule Span

     01MAR2015      1234  PRD           01MAR15:15:52    01MAR15:16:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:16:30    01MAR15:16:40  Gross OT

     01MAR2015      1234  BRKO          01MAR15:16:45    01MAR15:17:00  Gross OT

     01MAR2015      1234  PRD           01MAR15:17:00    01MAR15:18:00  Gross OT

Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.

PG

PG

View solution in original post


All Replies
SAS Employee
Posts: 340

Re: Updating dataset based on values of another dataset

proc sql;

     create table want as

     select l.*, r.Avail_Category

     from Agent_Schedule_Details as l left join Agent_Shift as r on (

          l.Agent_ID=r.Agent_ID and l.Start_Moment between r.Start_Moment and r.End_Moment

     )

;

quit;

The tricky part is in bold. You should precisely define how to match observations.

What if there is an activity for  agent 1234 from 16:25 to 16:35?

Solution
‎05-05-2015 11:19 PM
Respected Advisor
Posts: 4,935

Re: Updating dataset based on values of another dataset

Posted in reply to gergely_batho

If the Detail periods never span more than two consecutive shift categories, then you can fix the transactional data by splitting the spanning observations in two:

option linesize=120;

data agent_shift;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.) Avail_Category & :$20.;

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 SHIFT 01MAR2015:07:00:00.000000 01MAR2015:16:30:00.000000 Schedule Span

1-Mar-15 1234 OVPH 01MAR2015:16:30:00.000000 01MAR2015:18:00:00.000000 Gross OT

;

data Agent_Schedule_Details;

input Report_Date :anydtdte. Agent_ID Shift_Code $

    (Start_Moment End_Moment) (:datetime.);

format Report_Date date9. Start_Moment End_Moment datetime15.;

datalines;

1-Mar-15 1234 CKEY 01MAR2015:07:00:00.000000 01MAR2015:07:08:00.000000

1-Mar-15 1234 PRD 01MAR2015:07:08:00.000000 01MAR2015:10:30:00.000000

1-Mar-15 1234 BRK1 01MAR2015:10:30:00.000000 01MAR2015:10:45:00.000000

1-Mar-15 1234 PRD 01MAR2015:10:45:00.000000 01MAR2015:14:00:00.000000

1-Mar-15 1234 LUNCH 01MAR2015:14:00:00.000000 01MAR2015:14:30:00.000000

1-Mar-15 1234 PRD 01MAR2015:14:30:00.000000 01MAR2015:15:37:00.000000

1-Mar-15 1234 BRKL 01MAR2015:15:37:00.000000 01MAR2015:15:52:00.000000

1-Mar-15 1234 PRD 01MAR2015:15:52:00.000000 01MAR2015:16:40:00.000000

1-Mar-15 1234 BRKO 01MAR2015:16:45:00.000000 01MAR2015:17:00:00.000000

1-Mar-15 1234 PRD 01MAR2015:17:00:00.000000 01MAR2015:18:00:00.000000

;

proc sql;

create table want as

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    l.Start_Moment,

    min(l.End_Moment, r.End_Moment) as End_Moment format=datetime15.,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.Start_Moment >= r.Start_Moment and

        l.Start_moment < r.End_Moment

union

select

    l.Report_Date,

    l.Agent_ID,

    l.Shift_Code,

    max(l.Start_Moment, r.Start_Moment) as Start_Moment format=datetime15.,

    l.End_Moment,

    r.Avail_Category

from

    Agent_Schedule_Details as l left join

    Agent_Shift as r on

        l.Agent_ID = r.Agent_ID and

        l.End_Moment > r.Start_Moment and

        l.End_moment <= r.End_Moment

order by Agent_ID, Start_Moment

;

select * from want;

quit;

   Report_Date  Agent_ID  Shift_Code     Start_Moment       End_Moment  Avail_Category

   -----------------------------------------------------------------------------------

     01MAR2015      1234  CKEY          01MAR15:07:00    01MAR15:07:08  Schedule Span

     01MAR2015      1234  PRD           01MAR15:07:08    01MAR15:10:30  Schedule Span

     01MAR2015      1234  BRK1          01MAR15:10:30    01MAR15:10:45  Schedule Span

     01MAR2015      1234  PRD           01MAR15:10:45    01MAR15:14:00  Schedule Span

     01MAR2015      1234  LUNCH         01MAR15:14:00    01MAR15:14:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:14:30    01MAR15:15:37  Schedule Span

     01MAR2015      1234  BRKL          01MAR15:15:37    01MAR15:15:52  Schedule Span

     01MAR2015      1234  PRD           01MAR15:15:52    01MAR15:16:30  Schedule Span

     01MAR2015      1234  PRD           01MAR15:16:30    01MAR15:16:40  Gross OT

     01MAR2015      1234  BRKO          01MAR15:16:45    01MAR15:17:00  Gross OT

     01MAR2015      1234  PRD           01MAR15:17:00    01MAR15:18:00  Gross OT

Note: You can't use between as your join condtion because consecutine End and Start shift times are exactly the same (16:30) and thus could both match your Detail Start_Moment.

PG

PG
New Contributor
Posts: 2

Re: Updating dataset based on values of another dataset

Thanks, PGStats.  That is exactly what I was looking for

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 346 views
  • 4 likes
  • 3 in conversation