DATA Step, Macro, Functions and more

use of first and last variable in data step

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

use of first and last variable in data step

Hi,

I have two grouping variables: date and name. I extract hour from date (starttime). The goal is to compare remote versus local. There are two days of data for illustration.

I would like to calculate the number of worked hours per day per name (by taking the difference between starttime each day).

I would like also to calculate the total count per day per name. I have tried but was not successful. Here is the data and the code I tried. I appreciate any help.

DATA staff;

   LENGTH status $ 6;

   INPUT @1 date datetime16. @18 name$ @23 count @25 status $;

CARDS;

04Jan11:17:00:00 Beth 3 remote

04Jan11:17:30:00 Beth 3 remote

04Jan11:18:00:00 Beth 2 remote

04Jan11:18:30:00 Beth 0 remote

04Jan11:19:00:00 Beth 2 remote

04Jan11:17:30:00 Barb 4 local

04Jan11:18:00:00 Barb 3 local

04Jan11:18:30:00 Barb 2 local

04Jan11:19:00:00 Barb 6 local

04Jan11:07:30:00 Alex 1 local

04Jan11:08:00:00 Alex 2 local

04Jan11:08:30:00 Alex 8 local

04Jan11:09:00:00 Alex 8 local

04Jan11:09:30:00 Alex 2 local

04Jan11:10:00:00 Alex 1 local

05Jan11:19:30:00 Beth 4 local

05Jan11:20:00:00 Beth 2 local

05Jan11:20:30:00 Beth 0 local

05Jan11:21:00:00 Beth 3 local

05Jan11:21:30:00 Beth 1 local

05Jan11:08:30:00 Barb 1 remote

05Jan11:09:00:00 Barb 6 remote

05Jan11:09:30:00 Barb 1 remote

05Jan11:10:00:00 Barb 6 remote

05Jan11:10:30:00 Barb 1 remote

05Jan11:11:00:00 Barb 2 remote

05Jan11:11:30:00 Barb 1 remote

05Jan11:10:30:00 Alex 0 local

05Jan11:11:00:00 Alex 3 local

05Jan11:11:30:00 Alex 4 local

;

run;

proc print data=staff noobs;

  format date datetime18.;

run;

data temp;

set staff;

datep=datepart(date);

starttime=hour(date);

run;

proc sort data=temp;

  by datep name;

run;

data summarize;

  set temp;

  by datep name;

  retain R_time R_count;

/* number of worked hours per day per name */

  if first.name & first.datep then do;

      R_time=starttime;

  end;

  if last.name & last.datep then do;

     diff_time=starttime-R_time;

     output;

  end;

  if first.name & first.datep then

     R_count=0;

     R_count+count;

  if last.name & last.datep;

run;

The final output would look like

datepdatenameR_countdiff_time
186311609754400Alex226
186311609786800Barb154
186311609786800Beth105
186321609846200Alex73
186321609846200Barb187
186321609882200Beth105

Thanks.


Accepted Solutions
Solution
‎01-28-2015 02:59 PM
Super User
Posts: 17,784

Re: use of first and last variable in data step

Thanks for providing sample data.

I would recommend a proc means instead of data step process.

Here's an example of how that would work.  Some efficiency tricks:

  1. Use format dtdate9 on your datetime variable to summarize data by date
  2. Use Range for the date variable to obtain the max time - min time
  3. Datetime is stored as seconds, so convert to a number by dividing by 60 for minutes and another 60 for hours
  4. Or for datetime use a time format, both options are shown in the output.
  5. Not implemented - but you can add STATUS to the CLASS statement to summarize by local vs remote if required.

proc means data=staff nway noprint;

class date name;

format date dtdate9.;

output out=want (drop=_Smiley Happy sum(count)=r_count range(date)=duration;

run;

data want_output;

set want;

duration_time=duration;

duration=duration/60/60;

format duration;

format duration_time time5.;

run;

proc print data=want_output;

run;

View solution in original post


All Replies
SAS Employee
Posts: 232

Re: use of first and last variable in data step

Hi there, thank you for asking a question in Communities on SAS:

I am moving your question from the General Usage community to the' SAS Macro Facility, Data Step and SAS Language Elements' community so that you might find an answer more quickly. Thank you !

Solution
‎01-28-2015 02:59 PM
Super User
Posts: 17,784

Re: use of first and last variable in data step

Thanks for providing sample data.

I would recommend a proc means instead of data step process.

Here's an example of how that would work.  Some efficiency tricks:

  1. Use format dtdate9 on your datetime variable to summarize data by date
  2. Use Range for the date variable to obtain the max time - min time
  3. Datetime is stored as seconds, so convert to a number by dividing by 60 for minutes and another 60 for hours
  4. Or for datetime use a time format, both options are shown in the output.
  5. Not implemented - but you can add STATUS to the CLASS statement to summarize by local vs remote if required.

proc means data=staff nway noprint;

class date name;

format date dtdate9.;

output out=want (drop=_Smiley Happy sum(count)=r_count range(date)=duration;

run;

data want_output;

set want;

duration_time=duration;

duration=duration/60/60;

format duration;

format duration_time time5.;

run;

proc print data=want_output;

run;

Contributor
Posts: 20

Re: use of first and last variable in data step

Awesome! Thank you for the quick reply.

Contributor
Posts: 20

Re: use of first and last variable in data step

Hi,

Here is a different scenario where the dates were different with a gap in between, using proc means with range(date)=duration does not work.

DATA staff;

   LENGTH status $ 6;

   INPUT @1 date datetime18. @18 name$ @23 count @25 status $;

CARDS;

04Jan11:00:05:35 Beth 3 remote

04Jan11:10:11:32 Beth 3 remote

04Jan11:15:34:20 Beth 2 remote

04Jan11:16:45:45 Beth 0 remote

04Jan11:17:00:19 Beth 2 remote

04Jan11:18:30:00 Beth 4 remote

04Jan11:20:00:00 Beth 3 remote

04Jan11:22:00:13 Beth 2 remote

04Jan11:00:30:00 Alex 1 local

04Jan11:12:00:00 Alex 2 local

04Jan11:13:30:00 Alex 8 local

04Jan11:14:00:00 Alex 8 local

04Jan11:15:30:00 Alex 2 local

04Jan11:16:00:00 Alex 1 local

05Jan11:17:30:00 Alex 4 local

05Jan11:20:06:00 Alex 2 local

05Jan11:21:12:00 Alex 0 local

05Jan11:22:10:36 Alex 3 local

;

run;

For example for Beth the minimum is 04Jan11:00:05:13 and the maximum is 04Jan11:22:05:13 , giving a duration of 22 hours which is incorrect. Beth worked from 04Jan11:10:11:32 to 04Jan11:00:05:35 (04Jan11:24:05:35) , a duration of approximately 14 hours. How your former code can be changed to obtain the correct duration? Also Alex started at 04Jan11:12:00:00 pm and get of at 04Jan11:00:30:00 (04Jan11:24:10:00)


I appreciate the help

Thanks.

Super User
Posts: 17,784

Re: use of first and last variable in data step

That doesn't make sense.

04Jan11:00:05:35 (04Jan11:24:05:35) would be January 05 not January 04.

In a 24 hour clock there is  no 24 hours, the hours go from 0 to 23. 

Contributor
Posts: 20

Re: use of first and last variable in data step

That is right. What I mean is the following: Beth works from 24APR12:00:07:47 then get offs at 24APR12:07:16:59 and come back at 24APR12:21:51:31 to get off again at 24APR12:23:49:32 the same day

24APR12:00:07:472Beth
24APR12:00:17:210Beth
24APR12:01:40:142Beth
24APR12:01:44:470Beth
24APR12:01:49:500Beth
24APR12:02:01:220Beth
24APR12:02:09:522Beth
24APR12:02:56:150Beth
24APR12:05:35:232Beth
24APR12:06:02:000Beth
24APR12:06:31:260Beth
24APR12:06:45:240Beth
24APR12:06:56:530Beth
24APR12:07:00:542Beth
24APR12:07:16:592Beth
24APR12:21:51:310Beth
24APR12:22:14:442Beth
24APR12:22:22:470Beth
24APR12:22:25:401Beth
24APR12:23:00:400Beth
24APR12:23:05:040Beth
24APR12:23:06:210Beth
24APR12:23:25:390Beth
24APR12:23:49:320Beth

Thank you for your patience

Super User
Posts: 17,784

Re: use of first and last variable in data step

So a duration of about 7 hours and then 2 hours?

How do you identify when a work day starts/ends?

Contributor
Posts: 20

Re: use of first and last variable in data step

Correct.

There is no way to identify in the data when it starts, when it ends. I have a rough idea based on schedule. I think I can't use these dates to find out duration.

Thanks for your help.

Super User
Posts: 17,784

Re: use of first and last variable in data step

You COULD make a rule that if there is a span of more than 5 (or X) hours between times that a new shift has started?

If some thing like that would work, post it as a new question.

Contributor
Posts: 20

Re: use of first and last variable in data step

Yes, will post it as a new question. Thanks.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 358 views
  • 8 likes
  • 3 in conversation