BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AMFR
Quartz | Level 8

Hello all,

My data is below

 

data temp;

input ID TS HR;

informat TS datetime20.;

format TS datetime20.;

datalines;

 

188 18Jul17:15:27:00 97

188 18Jul17:15:28:00 88

188 18Jul17:15:39:00 85

188 18Jul17:15:55:00 88

188 18Jul17:16:00:00 88

188 18Jul17:16:17:00 87

188 18Jul17:16:22:00 86

188 18Jul17:16:35:00 88

188 18Jul17:16:45:00 88

188 18Jul17:17:01:00 85

188 18Jul17:17:29:00 83

188 18Jul17:17:31:00 84

188 18Jul17:18:10:00 83

188 18Jul17:19:00:00 82

188 18Jul17:21:00:00 81;

run;

 

 

what I need to do to is to extract HR at one hour interval (or close to one hour) from the first timestamp of each id. In the above data I have bold the timestamps I am interested in. How can I achieve this task. Thanks you all.

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

please try the below code which will output only the expected records, i wrote in datastep

 

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sort data=temp2;
by id hour diff;
run;

data temp3;
set temp2;
by id hour diff;
if first.hour;
run;
Thanks,
Jag

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

So as I see it, the logic is:

 

You want the first observation for each ID output. Next, you want every observation where the following observation has a TS value that is at least one hour later than the last outputted observation, correct?

error_prone
Barite | Level 11
@PeterClemmensen: not "at least one hour later", but as close as possible to one hour.
What to select, if after
188 18Jul17:18:10:00 83
Is
188 18Jul17:18:48:00 83?
Both are 19 minutes away from the perfect difference of one hour.
AMFR
Quartz | Level 8
It should be as close to one hour, either few minutes less or few minutes more to one hour. Answer for your question that it could be both.
Thanks
AMFR
Quartz | Level 8
Draycut..yes I want the first observation for each of the ID and then TS should be as close to one hour, either few minutes less or more.
Thanks!
mkeintz
PROC Star

@PeterClemmensen has already asked what rule you want when there are ties.   And what rule do you want when at the end of the id?  For instance if the reference time is 13:24  and you have the following complete time sequence for id 101

 

101 13:24   kept

101 13:40

101 13:59

101 14:20  kept    (closest to 14:24)

101 14:50 (keep?)

102 12:66

... ...

 

Do you keep the last 101 observation as closest toe 13:24 plus 2 hours?  the 13:24 and 14:20 will be kept,  but how close must the 5th observation be to the next target time (15:24) to be kept?

 

Edited addition:  and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AMFR
Quartz | Level 8

Do you keep the last 101 observation as closest toe 13:24 plus 2 hours?  the 13:24 and 14:20 will be kept,  but how close must the 5th observation be to the next target time (15:24) to be kept? Since 14:20 is almost an hour away from 13:24 so I would like to keep it and drop 14;50 because I want one record for each hour. If there is any timestamp for the 15th hour which is close to 14:20 then it would be in otherwise 14:20 would be the last row for 101. I hope I make myself clear this time.

 

Edited addition:  and did you want closest to one-hour after previous output, or closest to exact hour multiple after the first record? I am sorry I am not getting this question.

 

Thanks for your help!

Jagadishkatam
Amethyst | Level 16

please try the below code

 

the min is the final dataset name which has the flag variable. if you subset min on flag=1 you will get the expected records.

 

hope it helps.

 

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sql;
create table min as select distinct a.id,a.ts,a.hr,a.hour,a.diff, case when a.diff ne 0 and a.diff=b.min then 1 else 0 end as flag from temp2 as a left join (select id,ts, min(diff) as min from temp2 group by id, hour) as b 
on a.id=b.id and a.diff=b.min and a.ts=b.ts;
quit;

Thanks,
Jag
AMFR
Quartz | Level 8

Thank a lot for your help. Your codes work well for the first id of my dataset but not for others. SQL part is complicated for me so I have no idea how I can make it work for my whole dataset.

Jagadishkatam
Amethyst | Level 16

please try the below code which will output only the expected records, i wrote in datastep

 

data temp;
input ID TS HR;
informat TS datetime20.;
format TS datetime20.;
datalines;
188 18Jul17:15:27:00 97
188 18Jul17:15:28:00 88
188 18Jul17:15:39:00 85
188 18Jul17:15:55:00 88
188 18Jul17:16:00:00 88
188 18Jul17:16:17:00 87
188 18Jul17:16:22:00 86
188 18Jul17:16:35:00 88
188 18Jul17:16:45:00 88
188 18Jul17:17:01:00 85
188 18Jul17:17:29:00 83
188 18Jul17:17:31:00 84
188 18Jul17:18:10:00 83
188 18Jul17:19:00:00 82
188 18Jul17:21:00:00 81
;
run;

proc sort data=temp;
by id ts;
run;

data temp2;
set temp;
by id ts;
retain ts2;
if first.id then ts2=ts;
hour=(intck('hour',ts2,ts)*60);
minutes=intck('minute',ts2,ts);
diff=abs(hour-minutes);
format TS2 datetime20.;
run;

proc sort data=temp2;
by id hour diff;
run;

data temp3;
set temp2;
by id hour diff;
if first.hour;
run;
Thanks,
Jag
AMFR
Quartz | Level 8

Thanks it worked and easy to understand as well. I really appreciate your help.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3932 views
  • 0 likes
  • 5 in conversation