Hello all,
My data is below
data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81;
run;
what I need to do to is to extract HR at one hour interval (or close to one hour) from the first timestamp of each id. In the above data I have bold the timestamps I am interested in. How can I achieve this task. Thanks you all.
please try the below code which will output only the expected records, i wrote in datastep
data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;
proc sort data=temp;
by id ts;
run;
data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;
proc sort data=temp2;
by id hour diff;
run;
data temp3;
set temp2;
by id hour diff;
if first.hour;
run;
So as I see it, the logic is:
You want the first observation for each ID output. Next, you want every observation where the following observation has a TS value that is at least one hour later than the last outputted observation, correct?
@PeterClemmensen has already asked what rule you want when there are ties. And what rule do you want when at the end of the id? For instance if the reference time is 13:24 and you have the following complete time sequence for id 101
101 13:24 kept
101 13:40
101 13:59
101 14:20 kept (closest to 14:24)
101 14:50 (keep?)
102 12:66
... ...
Do you keep the last 101 observation as closest toe 13:24 plus 2 hours? the 13:24 and 14:20 will be kept, but how close must the 5th observation be to the next target time (15:24) to be kept?
Edited addition: and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record?
Do you keep the last 101 observation as closest toe 13:24 plus 2 hours? the 13:24 and 14:20 will be kept, but how close must the 5th observation be to the next target time (15:24) to be kept? Since 14:20 is almost an hour away from 13:24 so I would like to keep it and drop 14;50 because I want one record for each hour. If there is any timestamp for the 15th hour which is close to 14:20 then it would be in otherwise 14:20 would be the last row for 101. I hope I make myself clear this time.
Edited addition: and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record? I am sorry I am not getting this question.
Thanks for your help!
please try the below code
the min is the final dataset name which has the flag variable. if you subset min on flag=1 you will get the expected records.
hope it helps.
data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;
proc sort data=temp;
by id ts;
run;
data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;
proc sql;
create table min as select distinct a.id,a.ts,a.hr,a.hour,a.diff, case when a.diff ne 0 and a.diff=b.min then 1 else 0 end as flag from temp2 as a left join (select id,ts, min(diff) as min from temp2 group by id, hour) as b 
on a.id=b.id and a.diff=b.min and a.ts=b.ts;
quit;
Thank a lot for your help. Your codes work well for the first id of my dataset but not for others. SQL part is complicated for me so I have no idea how I can make it work for my whole dataset.
please try the below code which will output only the expected records, i wrote in datastep
data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;
proc sort data=temp;
by id ts;
run;
data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;
proc sort data=temp2;
by id hour diff;
run;
data temp3;
set temp2;
by id hour diff;
if first.hour;
run;
Thanks it worked and easy to understand as well. I really appreciate your help.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
