Dear SAS-experts 🙂
I actually think this is very basic but nevertheless it is causing me some problems. My observations are this:
INDATE OUTDATE INTIME OUTTIME LASTOUTDATE LASTOUTTIME
15/05/2013 15/05/2013 13 16 -
16/05/2013 16/05/2013 1 1 15/05/2013 16
(This is European time, ie. 1=1am and 16= 4pm)
I want to combine the 2 observations because there is <12 hours between 15/05/2013 16 (4pm) and 16/05/2013 1.
How do tell SAS that IN- and OUTTIME have a maximum of 23 hours (and then after starts at 0 on the next day?) I hope it makes sence...
data have;
informat
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
format
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;
proc sort data=have;
by id indate intime;
run;
data want;
set have;
by id;
retain numberlines;
format
indate_dt
lastoutdate_dt
datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;
If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.
Create new timestamp variables:
indt = dhms(indate,intime,0,0);
outdt = dhms(outdate,outtime,0,0);
Now you can use the INCK function to determine intervals
Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?
data want; indt="01JAN2015T00:00"dt; outdt="01JAN2015T03:40"dt; num_hours=hour(outdt) - hour(indt); run;
Note, you would need to be careful when going over midnight.
But the thing is, I do not know the exact date for all my observations so I have to use a general syntax identifying all observations over midnigt that should be combined (because of <12 hours between).
Sorry, I am not following you. Could you post - a datastep with some test data, covering all possibilities, and what the output should look like.
OBS ID INDATE OUTDATE INTIME OUTTIME LASTOUTDATE LASTOUTTIME NUMBERLINES
1 X 15/05/2013 15/05/2013 13 16 . . 1
2 X 16/05/2013 16/05/2013 1 1 15/05/2013 16 2
3 Y 24/12/2014 25/12/2014 14 16 . . 1
4 Z 02/12/2011 03/12/2011 16 23 . . 1
5 Z 04/12/2011 05/12/2011 2 17 03/12/2011 23 2
6 Z 06/12/2011 06/12/2011 1 8 05/12/2011 17 3
7 W 07/08/2012 08/08/2012 14 20 . . 1
8 M 01/01/2012 02/01/2012 16 8 . . 1
9 M 02/01/2012 03/01/2012 9 16 02/01/2012 8 2
I work on a 'secret server' so I cannot take data out. But this is different scenarios like in the dataset (but I have over 400.000 observations, so I cancot make syntaxes for all the exact dates). LASTOUTDATE and LASTOUTIME is made with the lag-function. If first.obs then LASTOUTDATE=. and LASTOUTTIME=. NUMBERLINES indicate wheter the lines are combined or not (and how many times).
data have;
informat
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
format
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;
proc sort data=have;
by id indate intime;
run;
data want;
set have;
by id;
retain numberlines;
format
indate_dt
lastoutdate_dt
datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;
If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.
@Gothardt wrote:
Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?
You find the necessary information here: About Date and Time Intervals
data _null_;
indt = '31mar2016:05:00:00'dt;
outdt = '31mar2016:13:00:00'dt;
diff = intck('hour',indt,outdt);
put diff=;
run;
Log excerpt:
16 data _null_; 17 indt = '31mar2016:05:00:00'dt; 18 outdt = '31mar2016:13:00:00'dt; 19 diff = intck('hour',indt,outdt); 20 put diff=; 21 run; diff=8
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.