I have a time series of values (e.g. speed of a car) and I want to understand the number of times that value exceeds a number for certain durations (e.g. "exceeded 55mph for more than 8 seconds" occurred 5 times during this 10 hour trip.
In excel I created a column that started counting when the value was above 55, and then reset to 0 when it fell below again. I created a second column that took the difference of the prior row and current row of that first calculated count column, which gave the highest number in each series plus a bunch of -1 values. Ignoring the -1 and 0 values, I can then get my desired count.
The full data set is too large for excel and i would prefer to run this in SAS, and in a data step, though Proc SQL is good also.
The excel method works, but I am stumped on how to get that first counting column that resets to zero when the continuous variable falls below the threshold. I don't think first. and last. would work, but maybe with an if/then statement? I am not sure how to reset an if/then count.
Assuming the excel method is a good path, how do I get that first counting column?
raw data example
date_time | speed | Count above 55 | count above 42 |
11DEC16:23:53:21 | 35.00 | 0 | 0 |
11DEC16:23:53:22 | 39.00 | 0 | 0 |
11DEC16:23:53:23 | 45.00 | 0 | 1 |
11DEC16:23:53:24 | 55.00 | 1 | 2 |
11DEC16:23:53:25 | 56.00 | 2 | 3 |
11DEC16:23:53:26 | 58.00 | 3 | 4 |
11DEC16:23:53:27 | 56.00 | 4 | 5 |
11DEC16:23:53:28 | 55.00 | 5 | 6 |
11DEC16:23:53:29 | 50.00 | 0 | 7 |
11DEC16:23:53:30 | 40.00 | 0 | 0 |
11DEC16:23:53:31 | 53.00 | 0 | 1 |
11DEC16:23:53:32 | 55.00 | 1 | 2 |
11DEC16:23:53:33 | 56.00 | 2 | 3 |
11DEC16:23:53:34 | 58.00 | 3 | 4 |
the final product would look like:
Number of times above55, sustained for xx seconds | |
xx | counts |
1 | |
2 | |
3 | 1 |
4 | |
5 | 1 |
6 | |
7 | |
8 | |
9 | |
10 |
Number of times above 42, sustained for xx seconds | |
xx | counts |
1 | |
2 | |
3 | |
4 | 1 |
5 | |
6 | |
7 | 1 |
8 | |
9 | |
10 |
This solution assumes the time data is continuous, i.e. each record is separated from the next by 1 second only.
You can use the retain statement to get the incremental counts of speeds above (and, apparently, also equal) to 42 and 55.
data have;
input date_time speed;
informat date_time datetime16.;
format date_time datetime16.;
cards;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
run;
proc sort data=have out=have_sorted;
by date_time;
run;
data want;
set have_sorted;
by date_time;
retain above_42 above_55;
if speed >= 42 then do;
above_42 + 1;
if speed >= 55 then above_55 + 1;
else above_55 = 0;
end;
else above_42 = 0;
if missing(above_42) then above_42 = 0;
if missing(above_55) then above_55 = 0;
run;
And if you wanted the final counts for each number of sustained seconds:
/* Group runs for sustained counts */
data groupings;
set want;
by date_time;
retain group_42 group_55;
if _n_ = 1 then do;
group_42 = 1;
group_55 = 1;
end;
else do;
if above_42 = 0 then group_42 + 1;
if above_55 = 0 then group_55 + 1;
end;
run;
proc sql noprint;
create table sustained_42 as
select distinct seconds, count(distinct group_42) as count_42
from (select distinct max(above_42) as seconds, group_42
from groupings
where above_42 ^= 0
group by group_42)
group by seconds
order by seconds
;
create table sustained_55 as
select distinct seconds, count(distinct group_55) as count_55
from (select distinct max(above_55) as seconds, group_55
from groupings
where above_55 ^= 0
group by group_55)
group by seconds
order by seconds
;
quit;
data perfect;
do seconds = 1 to 10;
output;
end;
run;
data want_final;
merge perfect sustained_42 sustained_55;
by seconds;
array counts count_42 count_55;
do over counts;
if missing(counts) then counts = 0;
end;
run;
This solution assumes the time data is continuous, i.e. each record is separated from the next by 1 second only.
You can use the retain statement to get the incremental counts of speeds above (and, apparently, also equal) to 42 and 55.
data have;
input date_time speed;
informat date_time datetime16.;
format date_time datetime16.;
cards;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
run;
proc sort data=have out=have_sorted;
by date_time;
run;
data want;
set have_sorted;
by date_time;
retain above_42 above_55;
if speed >= 42 then do;
above_42 + 1;
if speed >= 55 then above_55 + 1;
else above_55 = 0;
end;
else above_42 = 0;
if missing(above_42) then above_42 = 0;
if missing(above_55) then above_55 = 0;
run;
And if you wanted the final counts for each number of sustained seconds:
/* Group runs for sustained counts */
data groupings;
set want;
by date_time;
retain group_42 group_55;
if _n_ = 1 then do;
group_42 = 1;
group_55 = 1;
end;
else do;
if above_42 = 0 then group_42 + 1;
if above_55 = 0 then group_55 + 1;
end;
run;
proc sql noprint;
create table sustained_42 as
select distinct seconds, count(distinct group_42) as count_42
from (select distinct max(above_42) as seconds, group_42
from groupings
where above_42 ^= 0
group by group_42)
group by seconds
order by seconds
;
create table sustained_55 as
select distinct seconds, count(distinct group_55) as count_55
from (select distinct max(above_55) as seconds, group_55
from groupings
where above_55 ^= 0
group by group_55)
group by seconds
order by seconds
;
quit;
data perfect;
do seconds = 1 to 10;
output;
end;
run;
data want_final;
merge perfect sustained_42 sustained_55;
by seconds;
array counts count_42 count_55;
do over counts;
if missing(counts) then counts = 0;
end;
run;
The first part works great. I changed the if-then-else block however to this construct:
above_42 = ifn(speed>= 15, above_42 + 1,0);
above_55 = ifn(speed>= 40, above_55 + 1,0);
what is the purpose of "if missing(above_42) then above_42=0;
I would have expected this to be 'if missing the speed variable, then set the counter = 0'.
everything else is the same. Thanks!
if missing(above_42) then above_42=0;
This just rewrites initial missings (before the counters actually kick in) to zero. If your solution works, and plays well with the second block of code I posted, then stick with that.
I would propose to start with:
data speed;
input dt :datetime. speed;
format dt datetime.;
datalines;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
data want;
array above{2} _temporary_ (42, 55);
array start{2} _temporary_;
array stop{2} _temporary_;
set speed end=done;
do i = 1 to dim(above);
limit = above{i};
begin = start{i};
end = stop{i};
if start{i} then do;
if speed < above{i} then do;
output;
call missing(start{i});
end;
else if done then do;
end = dt;
output;
end;
end;
else if speed >= above{i} then start{i} = dt;
stop{i} = dt;
end;
format begin end datetime.;
keep limit begin end;
run;
proc print data=want noobs; run;
limit begin end 55 11DEC16:23:53:24 11DEC16:23:53:28 42 11DEC16:23:53:23 11DEC16:23:53:29 42 11DEC16:23:53:31 11DEC16:23:53:34 55 11DEC16:23:53:32 11DEC16:23:53:34
From there, it's only a matter of summing and counting.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.