Hi,
I have two grouping variables: date and name. I extract hour from date (starttime). The goal is to compare remote versus local. There are two days of data for illustration.
I would like to calculate the number of worked hours per day per name (by taking the difference between starttime each day).
I would like also to calculate the total count per day per name. I have tried but was not successful. Here is the data and the code I tried. I appreciate any help.
DATA staff;
LENGTH status $ 6;
INPUT @1 date datetime16. @18 name$ @23 count @25 status $;
CARDS;
04Jan11:17:00:00 Beth 3 remote
04Jan11:17:30:00 Beth 3 remote
04Jan11:18:00:00 Beth 2 remote
04Jan11:18:30:00 Beth 0 remote
04Jan11:19:00:00 Beth 2 remote
04Jan11:17:30:00 Barb 4 local
04Jan11:18:00:00 Barb 3 local
04Jan11:18:30:00 Barb 2 local
04Jan11:19:00:00 Barb 6 local
04Jan11:07:30:00 Alex 1 local
04Jan11:08:00:00 Alex 2 local
04Jan11:08:30:00 Alex 8 local
04Jan11:09:00:00 Alex 8 local
04Jan11:09:30:00 Alex 2 local
04Jan11:10:00:00 Alex 1 local
05Jan11:19:30:00 Beth 4 local
05Jan11:20:00:00 Beth 2 local
05Jan11:20:30:00 Beth 0 local
05Jan11:21:00:00 Beth 3 local
05Jan11:21:30:00 Beth 1 local
05Jan11:08:30:00 Barb 1 remote
05Jan11:09:00:00 Barb 6 remote
05Jan11:09:30:00 Barb 1 remote
05Jan11:10:00:00 Barb 6 remote
05Jan11:10:30:00 Barb 1 remote
05Jan11:11:00:00 Barb 2 remote
05Jan11:11:30:00 Barb 1 remote
05Jan11:10:30:00 Alex 0 local
05Jan11:11:00:00 Alex 3 local
05Jan11:11:30:00 Alex 4 local
;
run;
proc print data=staff noobs;
format date datetime18.;
run;
data temp;
set staff;
datep=datepart(date);
starttime=hour(date);
run;
proc sort data=temp;
by datep name;
run;
data summarize;
set temp;
by datep name;
retain R_time R_count;
/* number of worked hours per day per name */
if first.name & first.datep then do;
R_time=starttime;
end;
if last.name & last.datep then do;
diff_time=starttime-R_time;
output;
end;
if first.name & first.datep then
R_count=0;
R_count+count;
if last.name & last.datep;
run;
The final output would look like
datep | date | name | R_count | diff_time |
18631 | 1609754400 | Alex | 22 | 6 |
18631 | 1609786800 | Barb | 15 | 4 |
18631 | 1609786800 | Beth | 10 | 5 |
18632 | 1609846200 | Alex | 7 | 3 |
18632 | 1609846200 | Barb | 18 | 7 |
18632 | 1609882200 | Beth | 10 | 5 |
Thanks.
Thanks for providing sample data.
I would recommend a proc means instead of data step process.
Here's an example of how that would work. Some efficiency tricks:
proc means data=staff nway noprint;
class date name;
format date dtdate9.;
output out=want (drop=_:) sum(count)=r_count range(date)=duration;
run;
data want_output;
set want;
duration_time=duration;
duration=duration/60/60;
format duration;
format duration_time time5.;
run;
proc print data=want_output;
run;
Hi there, thank you for asking a question in Communities on SAS:
I am moving your question from the General Usage community to the' SAS Macro Facility, Data Step and SAS Language Elements' community so that you might find an answer more quickly. Thank you !
Thanks for providing sample data.
I would recommend a proc means instead of data step process.
Here's an example of how that would work. Some efficiency tricks:
proc means data=staff nway noprint;
class date name;
format date dtdate9.;
output out=want (drop=_:) sum(count)=r_count range(date)=duration;
run;
data want_output;
set want;
duration_time=duration;
duration=duration/60/60;
format duration;
format duration_time time5.;
run;
proc print data=want_output;
run;
Awesome! Thank you for the quick reply.
Hi,
Here is a different scenario where the dates were different with a gap in between, using proc means with range(date)=duration does not work.
DATA staff;
LENGTH status $ 6;
INPUT @1 date datetime18. @18 name$ @23 count @25 status $;
CARDS;
04Jan11:00:05:35 Beth 3 remote
04Jan11:10:11:32 Beth 3 remote
04Jan11:15:34:20 Beth 2 remote
04Jan11:16:45:45 Beth 0 remote
04Jan11:17:00:19 Beth 2 remote
04Jan11:18:30:00 Beth 4 remote
04Jan11:20:00:00 Beth 3 remote
04Jan11:22:00:13 Beth 2 remote
04Jan11:00:30:00 Alex 1 local
04Jan11:12:00:00 Alex 2 local
04Jan11:13:30:00 Alex 8 local
04Jan11:14:00:00 Alex 8 local
04Jan11:15:30:00 Alex 2 local
04Jan11:16:00:00 Alex 1 local
05Jan11:17:30:00 Alex 4 local
05Jan11:20:06:00 Alex 2 local
05Jan11:21:12:00 Alex 0 local
05Jan11:22:10:36 Alex 3 local
;
run;
For example for Beth the minimum is 04Jan11:00:05:13 and the maximum is 04Jan11:22:05:13 , giving a duration of 22 hours which is incorrect. Beth worked from 04Jan11:10:11:32 to 04Jan11:00:05:35 (04Jan11:24:05:35) , a duration of approximately 14 hours. How your former code can be changed to obtain the correct duration? Also Alex started at 04Jan11:12:00:00 pm and get of at 04Jan11:00:30:00 (04Jan11:24:10:00)
I appreciate the help
Thanks.
That doesn't make sense.
04Jan11:00:05:35 (04Jan11:24:05:35) would be January 05 not January 04.
In a 24 hour clock there is no 24 hours, the hours go from 0 to 23.
That is right. What I mean is the following: Beth works from 24APR12:00:07:47 then get offs at 24APR12:07:16:59 and come back at 24APR12:21:51:31 to get off again at 24APR12:23:49:32 the same day
24APR12:00:07:47 | 2 | Beth |
24APR12:00:17:21 | 0 | Beth |
24APR12:01:40:14 | 2 | Beth |
24APR12:01:44:47 | 0 | Beth |
24APR12:01:49:50 | 0 | Beth |
24APR12:02:01:22 | 0 | Beth |
24APR12:02:09:52 | 2 | Beth |
24APR12:02:56:15 | 0 | Beth |
24APR12:05:35:23 | 2 | Beth |
24APR12:06:02:00 | 0 | Beth |
24APR12:06:31:26 | 0 | Beth |
24APR12:06:45:24 | 0 | Beth |
24APR12:06:56:53 | 0 | Beth |
24APR12:07:00:54 | 2 | Beth |
24APR12:07:16:59 | 2 | Beth |
24APR12:21:51:31 | 0 | Beth |
24APR12:22:14:44 | 2 | Beth |
24APR12:22:22:47 | 0 | Beth |
24APR12:22:25:40 | 1 | Beth |
24APR12:23:00:40 | 0 | Beth |
24APR12:23:05:04 | 0 | Beth |
24APR12:23:06:21 | 0 | Beth |
24APR12:23:25:39 | 0 | Beth |
24APR12:23:49:32 | 0 | Beth |
Thank you for your patience
So a duration of about 7 hours and then 2 hours?
How do you identify when a work day starts/ends?
Correct.
There is no way to identify in the data when it starts, when it ends. I have a rough idea based on schedule. I think I can't use these dates to find out duration.
Thanks for your help.
You COULD make a rule that if there is a span of more than 5 (or X) hours between times that a new shift has started?
If some thing like that would work, post it as a new question.
Yes, will post it as a new question. Thanks.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.