Hi,
This is a continuation of the following threads, the second one was unanswered.
I have the following data and code (a different scenario). The fake data is just for illustration. The goal is to compute the duration by name and by date. In this example Beth works midnight till 7:30 am and come back to work 9:00 pm till midnight the same day. The gap is then between 7:30 am and 9:00 pm. The program works for Alex that has a different work schedule (no gap during the day, 7:30-3:30 pm). The program did not give me the expected result for Beth .
DATA fake;
LENGTH status $ 6;
INPUT @1 dt:datetime16. @18 name$
format dt datetime18.;
date = datepart(dt);
time = timepart(dt);
datalines;
04Jan11:00:21:12 Beth
04Jan11:02:19:06 Beth
04Jan11:05:00:13 Beth
04Jan11:07:12:11 Beth
04Jan11:21:10:25 Beth
04Jan11:23:26:13 Beth
05Jan11:00:23:51 Beth
05Jan11:01:30:10 Beth
05Jan11:03:12:06 Beth
05Jan11:05:06:15 Beth
05Jan11:07:23:11 Beth
05Jan11:21:30:25 Beth
05Jan11:23:28:15 Beth
04Jan11:07:43:00 Alex
04Jan11:08:12:10 Alex
04Jan11:09:40:00 Alex
04Jan11:11:20:16 Alex
04Jan11:12:25:14 Alex
04Jan11:14:20:07 Alex
;
run;
proc sort; by name date time;
data new;
format date date9. time time8.;
set fake;
by name date time;
if first.date then do;
if (minute(dt)>30) then newdate=dhms(date,hour(dt),0,1800);
if (minute(dt)<=30) then newdate=dhms(date,hour(dt),0,0);
end;
if last.date then do;
if (minute(dt)>30) then newdate=dhms(date,hour(dt),0,3600);
if (minute(dt)<=30) then newdate=dhms(date,hour(dt),0,1800);
end;
format newdate datetime16.;
run;
proc means data=new nway noprint;
class date name;
format newdate dtdate9.;
output out=want (drop=_:) range(newdate)=duration;
run;
data want_output;
set want;
duration_time=duration;
duration=duration/60/60;
format duration;
format duration_time hhmm8.2 date date9.;
run;
The resulting output is
04JAN2011 Alex 7 7:00.00
04JAN2011 Beth 23.5 23:30.00
05JAN2011 Beth 23.5 23:30.00
The desired output is
04JAN2011 Alex 7 7:00.00
04JAN2011 Beth 10.5 10:30.00
05JAN2011 Beth 10.5 10:30.00
There are wonderful helpers in this site. I appreciate any help.
Thanks.
Your second question didn't really have a valid dataset to test in the previous post.
The record with data in the other post:
19APR12:00:24:01 19APR12:00:00:00 Beth 4
was out of order and would have resulted in a much longer interval.
And your example data again
04Jan11:00:21:12 Beth
04Jan11:02:19:06 Beth
04Jan11:05:00:13 Beth
04Jan11:07:12:11 Beth -> start of a roughly 14 hour gap. WHAT IS THE EXPLICIT RULE THAT SAYS THERE
IS AN END AT AROUND 11:00:00 TO GET YOUR 10.5?
04Jan11:21:10:25 Beth -> AND THIS LOOKS LIKE THE TIME IS SUPPOSED TO BE COUNTED IN THE NEXT DAY
WHAT IS THE RULE?
04Jan11:23:26:13 Beth
If you have a separate table that shows assigned shifts this might be possible but without a general set of rules we can generate code that will work for one or two explicit cases but not a generic data set.
If you can say something like "if the gap exceeds 2 hours then consider that the start of a new interval" but it needs to be something that is valid EVERY time it occurs. The next part is which date does the total get assigned to? The date the "shift" starts or ends?
Thanks for your reply.
In this fake data, Beth starts at midnight 04Jan11:00:00:00, get first assignment at 04Jan11:00:21:12 and last assignment in the morning was at 04Jan11:07:12:11 and leave at 04Jan11:07:30:00. Beth get back the same day (yes gap exceed 2 hours) at 04Jan21:00:00:00 (9:00 pm), get first assignment at 04Jan11:21:10:25 and last assignment at 04Jan11:23:26:13 and leave at 04Jan11:23:59:59 (midnight) the same day. In short the schedule for Beth is 9:00 pm to 7:30 am the next day.
I can say "if the gap exceeds 2 hours then new interval". The duration is the difference between each starting date and ending date in each interval, by name and by day.
I can say "if the gap exceeds 2 hours then new interval". The duration is the difference between each starting date and ending date in each interval, by name and by day.
That makes Beth have 3 shifts on the 4th of January. If this is acceptable then add the Shift variable to proc means. Run proc means twice, once by date/shift and then use those results to sum by date.
data new;
format date date9. time time8.;
set fake;
by name date time;
retain shift 0;
if first.date then do;
if (minute(dt)>30) then newdate=dhms(date,hour(dt),0,1800);
if (minute(dt)<=30) then newdate=dhms(date,hour(dt),0,0);
end;
if last.date then do;
if (minute(dt)>30) then newdate=dhms(date,hour(dt),0,3600);
if (minute(dt)<=30) then newdate=dhms(date,hour(dt),0,1800);
end;
format newdate datetime16.;
dif_time=dif(time);
if first.date then do;
dif_time=.;
shift=1;
end;
if (dif_time/60/60)>2 then shift+1;
run;
Thank you for the attempt. It is my fault if it is unclear.
It is true Beth has three shifts ( I changed the difference to exceed 5 hours to be accurate) as should be seen in newdate:
1. 04JAN11:00:00:00 till 04JAN11:07:30:00
2: 04JAN11:21:00:00 till 05JAN11:07:30:00, this shift is the bottle neck. It crosses two days.
3: 05JAN11:21:00:00 till 05JAN11:00:00:00
I still would like an output per date and per name. The desired output is
dt name duration duration_time
04JAN2011 Alex 7 7:00.00
04JAN2011 Beth 10.5 10:30.00
05JAN2011 Beth 10.5 10:30.00
Thanks,
As I mentioned you can run two proc means and get the results you need. I'd really rather not write the whole code for you, since your getting paid for it, and more importantly it shows you understand whats happening.
And again
leave at 04Jan11:23:59:59 (midnight) the same day
requires an assumption as that is not in the data.
With
04Jan11:23:26:13 Beth
05Jan11:00:23:51 Beth
It really doesn't look like she "left" at midnight.
Please see my reply to Reeza.
The new variable newdata was created to display the beginning and ending time.
Beth left at midnight. The next data is the next day so she left.
04Jan11:23:26:13 Beth
05Jan11:00:23:51 Beth
It really doesn't look like she "left" at midnight.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.