Solved: How to extract a timestamp with one hour interval

AMFR · Posted 08-20-2018 02:06 PM

Hello all,

My data is below

data temp;

input ID TS HR;

informat TS datetime20.;

format TS datetime20.;

datalines;

188 18Jul17:15:27:00 97

188 18Jul17:15:28:00 88

188 18Jul17:15:39:00 85

188 18Jul17:15:55:00 88

188 18Jul17:16:00:00 88

188 18Jul17:16:17:00 87

188 18Jul17:16:22:00 86

188 18Jul17:16:35:00 88

188 18Jul17:16:45:00 88

188 18Jul17:17:01:00 85

188 18Jul17:17:29:00 83

188 18Jul17:17:31:00 84

188 18Jul17:18:10:00 83

188 18Jul17:19:00:00 82

188 18Jul17:21:00:00 81;

run;

what I need to do to is to extract HR at one hour interval (or close to one hour) from the first timestamp of each id. In the above data I have bold the timestamps I am interested in. How can I achieve this task. Thanks you all.

Jagadishkatam · Posted 08-21-2018 08:17 AM

please try the below code which will output only the expected records, i wrote in datastep

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sort data=temp2;
by id hour diff;
run;

data temp3;
set temp2;
by id hour diff;
if first.hour;
run;

Thanks,
Jag

View solution in original post

PeterClemmensen · Posted 08-20-2018 04:31 PM

So as I see it, the logic is:

You want the first observation for each ID output. Next, you want every observation where the following observation has a TS value that is at least one hour later than the last outputted observation, correct?

The DATA to DATA Step Macro
Blog: SASnrd

error_prone · Posted 08-20-2018 04:41 PM

@PeterClemmensen: not "at least one hour later", but as close as possible to one hour.
What to select, if after
188 18Jul17:18:10:00 83
Is
188 18Jul17:18:48:00 83?
Both are 19 minutes away from the perfect difference of one hour.

AMFR · Posted 08-20-2018 05:25 PM

It should be as close to one hour, either few minutes less or few minutes more to one hour. Answer for your question that it could be both.
Thanks

AMFR · Posted 08-20-2018 05:29 PM

Draycut..yes I want the first observation for each of the ID and then TS should be as close to one hour, either few minutes less or more.
Thanks!

mkeintz · Posted 08-20-2018 05:50 PM

@PeterClemmensen has already asked what rule you want when there are ties. And what rule do you want when at the end of the id? For instance if the reference time is 13:24 and you have the following complete time sequence for id 101

101 13:24 kept

101 13:40

101 13:59

101 14:20 kept (closest to 14:24)

101 14:50 (keep?)

102 12:66

... ...

Do you keep the last 101 observation as closest toe 13:24 plus 2 hours? the 13:24 and 14:20 will be kept, but how close must the 5th observation be to the next target time (15:24) to be kept?

Edited addition: and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

AMFR · Posted 08-21-2018 06:38 AM

Do you keep the last 101 observation as closest toe 13:24 plus 2 hours? the 13:24 and 14:20 will be kept, but how close must the 5th observation be to the next target time (15:24) to be kept? Since 14:20 is almost an hour away from 13:24 so I would like to keep it and drop 14;50 because I want one record for each hour. If there is any timestamp for the 15th hour which is close to 14:20 then it would be in otherwise 14:20 would be the last row for 101. I hope I make myself clear this time.

Edited addition: and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record? I am sorry I am not getting this question.

Thanks for your help!

Jagadishkatam · Posted 08-21-2018 04:51 AM

please try the below code

the min is the final dataset name which has the flag variable. if you subset min on flag=1 you will get the expected records.

hope it helps.

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sql;
create table min as select distinct a.id,a.ts,a.hr,a.hour,a.diff, case when a.diff ne 0 and a.diff=b.min then 1 else 0 end as flag from temp2 as a left join (select id,ts, min(diff) as min from temp2 group by id, hour) as b 
on a.id=b.id and a.diff=b.min and a.ts=b.ts;
quit;

Thanks,
Jag

AMFR · Posted 08-21-2018 06:56 AM

Thank a lot for your help. Your codes work well for the first id of my dataset but not for others. SQL part is complicated for me so I have no idea how I can make it work for my whole dataset.

Jagadishkatam · Posted 08-21-2018 08:17 AM

please try the below code which will output only the expected records, i wrote in datastep

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sort data=temp2;
by id hour diff;
run;

data temp3;
set temp2;
by id hour diff;
if first.hour;
run;

Thanks,
Jag

AMFR · Posted 08-21-2018 09:17 AM

Thanks it worked and easy to understand as well. I really appreciate your help.

How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Re: How to extract a timestamp with one hour interval

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!