I have a time series of values (e.g. speed of a car) and I want to understand the number of times that value exceeds a number for certain durations (e.g. "exceeded 55mph for more than 8 seconds" occurred 5 times during this 10 hour trip.
In excel I created a column that started counting when the value was above 55, and then reset to 0 when it fell below again. I created a second column that took the difference of the prior row and current row of that first calculated count column, which gave the highest number in each series plus a bunch of -1 values. Ignoring the -1 and 0 values, I can then get my desired count.
The full data set is too large for excel and i would prefer to run this in SAS, and in a data step, though Proc SQL is good also.
The excel method works, but I am stumped on how to get that first counting column that resets to zero when the continuous variable falls below the threshold. I don't think first. and last. would work, but maybe with an if/then statement? I am not sure how to reset an if/then count.
Assuming the excel method is a good path, how do I get that first counting column?
raw data example
date_time | speed | Count above 55 | count above 42 |
11DEC16:23:53:21 | 35.00 | 0 | 0 |
11DEC16:23:53:22 | 39.00 | 0 | 0 |
11DEC16:23:53:23 | 45.00 | 0 | 1 |
11DEC16:23:53:24 | 55.00 | 1 | 2 |
11DEC16:23:53:25 | 56.00 | 2 | 3 |
11DEC16:23:53:26 | 58.00 | 3 | 4 |
11DEC16:23:53:27 | 56.00 | 4 | 5 |
11DEC16:23:53:28 | 55.00 | 5 | 6 |
11DEC16:23:53:29 | 50.00 | 0 | 7 |
11DEC16:23:53:30 | 40.00 | 0 | 0 |
11DEC16:23:53:31 | 53.00 | 0 | 1 |
11DEC16:23:53:32 | 55.00 | 1 | 2 |
11DEC16:23:53:33 | 56.00 | 2 | 3 |
11DEC16:23:53:34 | 58.00 | 3 | 4 |
the final product would look like:
Number of times above55, sustained for xx seconds | |
xx | counts |
1 | |
2 | |
3 | 1 |
4 | |
5 | 1 |
6 | |
7 | |
8 | |
9 | |
10 |
Number of times above 42, sustained for xx seconds | |
xx | counts |
1 | |
2 | |
3 | |
4 | 1 |
5 | |
6 | |
7 | 1 |
8 | |
9 | |
10 |
This solution assumes the time data is continuous, i.e. each record is separated from the next by 1 second only.
You can use the retain statement to get the incremental counts of speeds above (and, apparently, also equal) to 42 and 55.
data have;
input date_time speed;
informat date_time datetime16.;
format date_time datetime16.;
cards;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
run;
proc sort data=have out=have_sorted;
by date_time;
run;
data want;
set have_sorted;
by date_time;
retain above_42 above_55;
if speed >= 42 then do;
above_42 + 1;
if speed >= 55 then above_55 + 1;
else above_55 = 0;
end;
else above_42 = 0;
if missing(above_42) then above_42 = 0;
if missing(above_55) then above_55 = 0;
run;
And if you wanted the final counts for each number of sustained seconds:
/* Group runs for sustained counts */
data groupings;
set want;
by date_time;
retain group_42 group_55;
if _n_ = 1 then do;
group_42 = 1;
group_55 = 1;
end;
else do;
if above_42 = 0 then group_42 + 1;
if above_55 = 0 then group_55 + 1;
end;
run;
proc sql noprint;
create table sustained_42 as
select distinct seconds, count(distinct group_42) as count_42
from (select distinct max(above_42) as seconds, group_42
from groupings
where above_42 ^= 0
group by group_42)
group by seconds
order by seconds
;
create table sustained_55 as
select distinct seconds, count(distinct group_55) as count_55
from (select distinct max(above_55) as seconds, group_55
from groupings
where above_55 ^= 0
group by group_55)
group by seconds
order by seconds
;
quit;
data perfect;
do seconds = 1 to 10;
output;
end;
run;
data want_final;
merge perfect sustained_42 sustained_55;
by seconds;
array counts count_42 count_55;
do over counts;
if missing(counts) then counts = 0;
end;
run;
This solution assumes the time data is continuous, i.e. each record is separated from the next by 1 second only.
You can use the retain statement to get the incremental counts of speeds above (and, apparently, also equal) to 42 and 55.
data have;
input date_time speed;
informat date_time datetime16.;
format date_time datetime16.;
cards;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
run;
proc sort data=have out=have_sorted;
by date_time;
run;
data want;
set have_sorted;
by date_time;
retain above_42 above_55;
if speed >= 42 then do;
above_42 + 1;
if speed >= 55 then above_55 + 1;
else above_55 = 0;
end;
else above_42 = 0;
if missing(above_42) then above_42 = 0;
if missing(above_55) then above_55 = 0;
run;
And if you wanted the final counts for each number of sustained seconds:
/* Group runs for sustained counts */
data groupings;
set want;
by date_time;
retain group_42 group_55;
if _n_ = 1 then do;
group_42 = 1;
group_55 = 1;
end;
else do;
if above_42 = 0 then group_42 + 1;
if above_55 = 0 then group_55 + 1;
end;
run;
proc sql noprint;
create table sustained_42 as
select distinct seconds, count(distinct group_42) as count_42
from (select distinct max(above_42) as seconds, group_42
from groupings
where above_42 ^= 0
group by group_42)
group by seconds
order by seconds
;
create table sustained_55 as
select distinct seconds, count(distinct group_55) as count_55
from (select distinct max(above_55) as seconds, group_55
from groupings
where above_55 ^= 0
group by group_55)
group by seconds
order by seconds
;
quit;
data perfect;
do seconds = 1 to 10;
output;
end;
run;
data want_final;
merge perfect sustained_42 sustained_55;
by seconds;
array counts count_42 count_55;
do over counts;
if missing(counts) then counts = 0;
end;
run;
The first part works great. I changed the if-then-else block however to this construct:
above_42 = ifn(speed>= 15, above_42 + 1,0);
above_55 = ifn(speed>= 40, above_55 + 1,0);
what is the purpose of "if missing(above_42) then above_42=0;
I would have expected this to be 'if missing the speed variable, then set the counter = 0'.
everything else is the same. Thanks!
if missing(above_42) then above_42=0;
This just rewrites initial missings (before the counters actually kick in) to zero. If your solution works, and plays well with the second block of code I posted, then stick with that.
I would propose to start with:
data speed;
input dt :datetime. speed;
format dt datetime.;
datalines;
11DEC16:23:53:21 35.00
11DEC16:23:53:22 39.00
11DEC16:23:53:23 45.00
11DEC16:23:53:24 55.00
11DEC16:23:53:25 56.00
11DEC16:23:53:26 58.00
11DEC16:23:53:27 56.00
11DEC16:23:53:28 55.00
11DEC16:23:53:29 50.00
11DEC16:23:53:30 40.00
11DEC16:23:53:31 53.00
11DEC16:23:53:32 55.00
11DEC16:23:53:33 56.00
11DEC16:23:53:34 58.00
;
data want;
array above{2} _temporary_ (42, 55);
array start{2} _temporary_;
array stop{2} _temporary_;
set speed end=done;
do i = 1 to dim(above);
limit = above{i};
begin = start{i};
end = stop{i};
if start{i} then do;
if speed < above{i} then do;
output;
call missing(start{i});
end;
else if done then do;
end = dt;
output;
end;
end;
else if speed >= above{i} then start{i} = dt;
stop{i} = dt;
end;
format begin end datetime.;
keep limit begin end;
run;
proc print data=want noobs; run;
limit begin end 55 11DEC16:23:53:24 11DEC16:23:53:28 42 11DEC16:23:53:23 11DEC16:23:53:29 42 11DEC16:23:53:31 11DEC16:23:53:34 55 11DEC16:23:53:32 11DEC16:23:53:34
From there, it's only a matter of summing and counting.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.