DATA Step, Macro, Functions and more

Difference between fields with criteria

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

Difference between fields with criteria

Good Day All!

 

I am having a bit of a memory block on this one.

 

What i have essentially is a list of users and a a seperate record and timestamp each time they changed thier status in the production system we are using.   What i am looking to do in the end is get a total time of all statuses that one of the users had during the day, then skip the very last status for that agent as the next record will have a different agent name... this is the part that is confusing me... 

 

At the end... i want to add up the statuses to get a total time in each status... so like i can see available time (waiting for an issue)... working (working on an issue) and unavailable (not working or waiting for an issue)

 

Can anyone give me some help in what i need to do to accomplish this?  It's straight forward data but i just can't figure out the right combination of code to make it happen.

 

Thanks,

Dean


Accepted Solutions
Solution
‎05-11-2016 05:32 PM
Super User
Posts: 5,082

Re: Difference between fields with criteria

As long as other variables are not particularly important, the simplest way to compute this would be to calculate based on the ending record, not the starting record.  For example:

 

proc sort data=have;

by user timestamp;

run;

 

data want;

set have;

by user;

prior_status = lag(status);

duration = dif(timestamp);

if first.user=0;

keep user prior_status duration;

run;

View solution in original post


All Replies
Solution
‎05-11-2016 05:32 PM
Super User
Posts: 5,082

Re: Difference between fields with criteria

As long as other variables are not particularly important, the simplest way to compute this would be to calculate based on the ending record, not the starting record.  For example:

 

proc sort data=have;

by user timestamp;

run;

 

data want;

set have;

by user;

prior_status = lag(status);

duration = dif(timestamp);

if first.user=0;

keep user prior_status duration;

run;

Contributor
Posts: 68

Re: Difference between fields with criteria

Thanks for that... it is some awesome code... 

 

so i have data that looks like this:

 

10MAY16:07:58:14	Available	0:09:44
10MAY16:08:00:53	Unavailable	0:02:39
10MAY16:08:08:20	Available	0:07:28
10MAY16:08:52:18	Work In Progress	0:43:57
10MAY16:08:55:18	Available	0:03:00
10MAY16:09:06:17	Work In Progress	0:10:59
10MAY16:09:36:49	Unavailable	0:30:32
10MAY16:10:25:34	Available	0:48:44
10MAY16:10:55:53	Unavailable	0:30:19
10MAY16:11:10:54	Available	0:15:02
10MAY16:11:15:18	Work In Progress	0:04:23
10MAY16:12:23:02	Unavailable	1:07:44
10MAY16:12:32:34	Available	0:09:32
10MAY16:12:54:10	Work In Progress	0:21:37
10MAY16:13:00:19	Available	0:06:09
10MAY16:13:33:23	Work In Progress	0:33:03
10MAY16:14:23:41	Unavailable	0:50:19
10MAY16:14:34:24	Available	0:10:42
10MAY16:14:38:09	Work In Progress	0:03:45
10MAY16:14:52:01	Available	0:13:53

So the general idea is that i can sum up available times, unavailable times and work in progress times and get an accurate #... however.... the correct data is one line down if that makes sense.... so on the very first line... the available time i want to be 2:39 instead of the 2nd line for the duration... 

 

I am not sure how to do this... but i am going to learn it tonight i have a feeling so i can do this in the future... its really cool.  I totally appreciate your help.

 

Dean

Super User
Posts: 10,500

Re: Difference between fields with criteria

I think you need to show more detail as I am not sure how you are intending to use the "second line". Show the values such as in arithmetic and the expected result for a few lines of the data.

Contributor
Posts: 68

Re: Difference between fields with criteria

So  basically what i am doing is starting from a list that shows me user, status, timestamp (that user went into said status)... then i have 3 status catagories  (Available, Unavailable, and Work in Progress).

 

What i want to do is calculate the total sum of each status by user.  The first step was to understand how to get the differences between the timestamps and into the correct fields so i can do the calculation afterwards.  The the last record for each user should be blank by default because there is not another login time below it for the same user.

 

Now the code above works great... but the data is on the wrong line... the "Is" column as shown below generates the calculation and the "Should Be" is what i am looking for it to read.  So in essence, off by one line. and there was data in the very last record on each agent... which i want to keep blank.

 

Taking the first 2 lines of the table as an example... john was available agt 7:58:14, and then went unavailable at 8:00:53... which means that John was Available for 2.6 minutes.... so in the line one column 3 (Available) i want to see john as available for 2.6 minutes., not 9.7 minutes... but the 2.6 minutes in the "Is" column is on line 2.

 

Agent Timestamp Prior_Status Is Should Be
John Doe 10MAY16:07:58:14 Available 9.7 2.6
John Doe 10MAY16:08:00:53 Unavailable 2.6 7.5
John Doe 10MAY16:08:08:20 Available 7.5 44
John Doe 10MAY16:08:52:18 Work In Progress 44 3
John Doe 10MAY16:08:55:18 Available 3 11
John Doe 10MAY16:09:06:17 Work In Progress 11 30.5
John Doe 10MAY16:09:36:49 Unavailable 30.5 48.7
John Doe 10MAY16:10:25:34 Available 48.7 30.3
John Doe 10MAY16:10:55:53 Unavailable 30.3 15
John Doe 10MAY16:11:10:54 Available 15 4.4
John Doe 10MAY16:11:15:18 Work In Progress 4.4 67.7
John Doe 10MAY16:12:23:02 Unavailable 67.7 --

 

 

Super User
Posts: 10,500

Re: Difference between fields with criteria

Quick and (very) dirty: Sort the data by agent and descending timestamp; Use @Astounding's code but muliply the dif by -1 then resort the result by agent and timestamp.

Though I might look for something a bit more elegant if there were millions of records involved. Maybe.

 

Super User
Posts: 5,082

Re: Difference between fields with criteria

I think the code I gave you originally should be fine, as long as you focus on the calculated variable named DURATION.  The variable named LS might be the one that was already in your data, but should be ignored.

 

At any rate, if you have additional variables that you want to keep around, here is another way.

 

data want;

set have;

by id;

if last.id=0 then do;

   _n_ = _n_ + 1;

   set have (keep=LS rename=(LS=replacement_LS)) point=_n_;

end;

else replacement_LS=.;

run;

 

This code moves the duration back to the previous record, using the variable name REPLACEMENT_LS.

Contributor
Posts: 68

Re: Difference between fields with criteria

Thanks Astounding!  I feel like an idiot though... our original code worked perfectly... i had to go back to the raw data before it was imported to check something and realized that i was wrong... everything was in the right place... I can't believe i missed it... 

 

Everything is good to go... thanks for the outstanding code... you saved my life and made my VP very happy.  She has been after this data for over a year and knew there was a way to do it... just had to reach out.... thanks again!

 

Dean

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 284 views
  • 0 likes
  • 3 in conversation