Dear SAS-experts 🙂
I actually think this is very basic but nevertheless it is causing me some problems. My observations are this:
INDATE OUTDATE INTIME OUTTIME LASTOUTDATE LASTOUTTIME
15/05/2013 15/05/2013 13 16 -
16/05/2013 16/05/2013 1 1 15/05/2013 16
(This is European time, ie. 1=1am and 16= 4pm)
I want to combine the 2 observations because there is <12 hours between 15/05/2013 16 (4pm) and 16/05/2013 1.
How do tell SAS that IN- and OUTTIME have a maximum of 23 hours (and then after starts at 0 on the next day?) I hope it makes sence...
data have;
informat
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
format
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;
proc sort data=have;
by id indate intime;
run;
data want;
set have;
by id;
retain numberlines;
format
indate_dt
lastoutdate_dt
datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;
If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.
Create new timestamp variables:
indt = dhms(indate,intime,0,0);
outdt = dhms(outdate,outtime,0,0);
Now you can use the INCK function to determine intervals
Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?
data want; indt="01JAN2015T00:00"dt; outdt="01JAN2015T03:40"dt; num_hours=hour(outdt) - hour(indt); run;
Note, you would need to be careful when going over midnight.
But the thing is, I do not know the exact date for all my observations so I have to use a general syntax identifying all observations over midnigt that should be combined (because of <12 hours between).
Sorry, I am not following you. Could you post - a datastep with some test data, covering all possibilities, and what the output should look like.
OBS ID INDATE OUTDATE INTIME OUTTIME LASTOUTDATE LASTOUTTIME NUMBERLINES
1 X 15/05/2013 15/05/2013 13 16 . . 1
2 X 16/05/2013 16/05/2013 1 1 15/05/2013 16 2
3 Y 24/12/2014 25/12/2014 14 16 . . 1
4 Z 02/12/2011 03/12/2011 16 23 . . 1
5 Z 04/12/2011 05/12/2011 2 17 03/12/2011 23 2
6 Z 06/12/2011 06/12/2011 1 8 05/12/2011 17 3
7 W 07/08/2012 08/08/2012 14 20 . . 1
8 M 01/01/2012 02/01/2012 16 8 . . 1
9 M 02/01/2012 03/01/2012 9 16 02/01/2012 8 2
I work on a 'secret server' so I cannot take data out. But this is different scenarios like in the dataset (but I have over 400.000 observations, so I cancot make syntaxes for all the exact dates). LASTOUTDATE and LASTOUTIME is made with the lag-function. If first.obs then LASTOUTDATE=. and LASTOUTTIME=. NUMBERLINES indicate wheter the lines are combined or not (and how many times).
data have;
informat
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
format
id $1.
indate ddmmyy10.
outdate ddmmyy10.
;
input id indate outdate intime outtime;
cards;
X 15/05/2013 15/05/2013 13 16
X 16/05/2013 16/05/2013 1 1
Y 24/12/2014 25/12/2014 14 16
Z 02/12/2011 03/12/2011 16 23
Z 04/12/2011 05/12/2011 2 17
Z 06/12/2011 06/12/2011 1 8
W 07/08/2012 08/08/2012 14 20
M 01/01/2012 02/01/2012 16 8
M 02/01/2012 03/01/2012 9 16
;
run;
proc sort data=have;
by id indate intime;
run;
data want;
set have;
by id;
retain numberlines;
format
indate_dt
lastoutdate_dt
datetime19.
;
lastoutdate_dt = dhms(lag(outdate),lag(outtime),0,0);
if first.id then lastoutdate_dt = 0;
indate_dt = dhms(indate,intime,0,0);
if intck('hour',lastoutdate_dt,indate_dt) < 12
then numberlines + 1;
else numberlines = 1;
run;
If you need the original order, you would need to preserve that with setting a variable to _n_ in the first step and sorting the want dataset by that variable.
@Gothardt wrote:
Thanks for your answer. I have created the timestamp variables. But how to write the INTCK-syntax with interval for hours? In the INTCK-function description I can only find syntaxes for days and months. I could use days and then afterwards calculate hours, but I'm sure there is an easier way?
You find the necessary information here: About Date and Time Intervals
data _null_;
indt = '31mar2016:05:00:00'dt;
outdt = '31mar2016:13:00:00'dt;
diff = intck('hour',indt,outdt);
put diff=;
run;
Log excerpt:
16 data _null_; 17 indt = '31mar2016:05:00:00'dt; 18 outdt = '31mar2016:13:00:00'dt; 19 diff = intck('hour',indt,outdt); 20 put diff=; 21 run; diff=8
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.