BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Gothardt
Obsidian | Level 7

Dear SAS-experts 🙂

 

I actually think this is very basic but nevertheless it is causing me some problems. My observations are this:

 

INDATE              OUTDATE       INTIME     OUTTIME        LASTOUTDATE    LASTOUTTIME

15/05/2013       15/05/2013      13               16                    -

16/05/2013       16/05/2013      1                  1                    15/05/2013           16

 

(This is European time, ie. 1=1am and 16= 4pm)

 

I want to combine the 2 observations because there is <12 hours between 15/05/2013 16 (4pm) and 16/05/2013 1.

 

How do tell SAS that IN- and OUTTIME have a maximum of 23 hours (and then after starts at 0 on the next day?) I hope it makes sence...

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
data have;
informat
  id $1.
  indate ddmmyy10.
  outdate ddmmyy10.
;
format
  id $1.
  indate ddmmyy10.
  outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;

proc sort data=have;
by id indate intime;
run;

data want;
set have;
by id;
retain numberlines;
format
  indate_dt
  lastoutdate_dt
    datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;

If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Create new timestamp variables:

indt = dhms(indate,intime,0,0);
outdt = dhms(outdate,outtime,0,0);

Now you can use the INCK function to determine intervals

Gothardt
Obsidian | Level 7

Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26
data want;
  indt="01JAN2015T00:00"dt;
  outdt="01JAN2015T03:40"dt;
  num_hours=hour(outdt) - hour(indt);
run;

Note, you would need to be careful when going over midnight.

Gothardt
Obsidian | Level 7

But the thing is, I do not know the exact date for all my observations so I have to use a general syntax identifying all observations over midnigt that should be combined (because of <12 hours between).

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, I am not following you.  Could you post - a datastep with some test data, covering all possibilities, and what the output should look like.

Gothardt
Obsidian | Level 7

OBS  ID   INDATE              OUTDATE       INTIME     OUTTIME        LASTOUTDATE    LASTOUTTIME    NUMBERLINES

   1     X   15/05/2013       15/05/2013       13               16                    .                             .                         1

   2     X   16/05/2013       16/05/2013        1                  1                   15/05/2013           16                        2

   3     Y    24/12/2014       25/12/2014      14                16                   .                             .                         1

   4     Z    02/12/2011       03/12/2011      16                23                   .                             .                         1

   5     Z    04/12/2011       05/12/2011       2                 17                  03/12/2011            23                       2

   6     Z    06/12/2011       06/12/2011        1                8                    05/12/2011            17                       3

   7     W   07/08/2012       08/08/2012       14              20                   .                              .                         1

   8     M   01/01/2012       02/01/2012       16               8                    .                              .                         1

   9     M   02/01/2012      03/01/2012         9                16                   02/01/2012           8                         2

 

I work on a 'secret server' so I cannot take data out. But this is different scenarios like in the dataset (but I have over 400.000 observations, so I cancot make syntaxes for all the exact dates). LASTOUTDATE and LASTOUTIME is made with the lag-function. If first.obs then LASTOUTDATE=. and LASTOUTTIME=. NUMBERLINES indicate wheter the lines are combined or not (and how many times).

Kurt_Bremser
Super User
data have;
informat
  id $1.
  indate ddmmyy10.
  outdate ddmmyy10.
;
format
  id $1.
  indate ddmmyy10.
  outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;

proc sort data=have;
by id indate intime;
run;

data want;
set have;
by id;
retain numberlines;
format
  indate_dt
  lastoutdate_dt
    datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;

If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.

Kurt_Bremser
Super User

@Gothardt wrote:

Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?

 


You find the necessary information here: About Date and Time Intervals

data _null_;
indt = '31mar2016:05:00:00'dt;
outdt = '31mar2016:13:00:00'dt;
diff = intck('hour',indt,outdt);
put diff=;
run;

Log excerpt:

16         data _null_;
17         indt = '31mar2016:05:00:00'dt;
18         outdt = '31mar2016:13:00:00'dt;
19         diff = intck('hour',indt,outdt);
20         put diff=;
21         run;

diff=8

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1674 views
  • 1 like
  • 3 in conversation