Introduction to Time Series Anomaly Detection

1 Like

The purpose of this post is to learn about different kinds of anomalies in time series data, and about the algorithms that can be used to detect them. This will be an introduction to the topic, so our focus will be on understanding the basic approach to anomaly detection used by the various algorithms, rather than the details of the algorithms themselves. Our goal is to understand different kinds of time series anomalies and to define time series anomaly detection evaluation metrics.

This introductory post will emphasize the general approach for time series anomaly detection. A second post in this series will explore diverse approaches to time series anomaly detection.

Anomalies in Time Series Data – Point, Subsequence, and Contextual Anomalies

In tabular data analysis an anomaly is a single observation (a row of data) with unusual or unexpected values for some or all the variables (columns) in the dataset. This definition is intentionally vague because anomalies are context-dependent, but the basic idea is that the anomaly doesn’t fit with the pattern of the “normal” data. This means that characterizing the pattern of “normal” (non-anomalous) data is an important task in any anomaly detection approach. A common approach to find statistical outliers in datasets is to look for points that are more than three standard deviations from the mean (a three-sigma rule) and declare these points as outliers. This approach is only sensible if the underlying data is approximately normally distributed, which represents our “characterization of the pattern of normal data” mentioned previously. Any attempt to define or detect anomalies will require assumptions about the “normal” data and how it is generated.

Time series data have additional structure beyond tabular data, and thus anomalies in time series data can be more complex than anomalies in tabular data. Time series data are ordered, with the time ID variable defining the time step and the ordering for the data. This means that collections of adjacent points in time series data can be used to define meaningful subsequences in the time series. Time series anomalies can be broadly categorized into the following three types of anomalies:

Point Anomalies – A single time point has an unusual value; this value is significantly different from the previous time series values or deviates from regular patterns in the data. These are often the easiest kinds of anomalies to find, but subsequence anomalies are more common in practice.
- Real-World Example: A financial services company monitors total daily customer spending on a credit card. This spending is usually between $50-100 but on one day the total exceeds $5,000. This unusually high total is a point anomaly in the time series and could represent fraudulent activity (it could also represent the customer deciding to buy an unexpected big item, but it would still be a point anomaly in the data).

- Simple Graphical Example: The red point highlighted in the time series plot below is a simple point anomaly, the value is larger than any of the previous values by a big margin, but all subsequent points are back to “normal”.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Subsequence Anomalies – A series of contiguous time points have unusual values or exhibit a pattern that deviates from the pattern of normal data.
- Real-World Example: A person wearing a heart-rate monitor performs a workout and tracks heart rate every minute during the workout. After their heartrate rises to 160bpm during the most intense part of the workout the sensor abruptly outputs 0bpm for the heartrate for the next 2 minutes before returning to 160bpm. The subsequence of 0bpm readings would be a subsequence anomaly in the time series, breaking the pattern of heart rate from the previous time points. One thing to note is that this subsequence will be an anomaly in the data regardless of why it occurred (i.e. it’s an anomaly if the person removed the heart-rate monitor, but it’s still an anomaly if the person went into cardiac arrest and was resuscitated.

- Simple Graphical Example: The red region highlighted in the time series plot below is a subsequence anomaly, the amplitude of the sinusoidal oscillation in the subsequence is larger than for other subsequences in the series.

Contextual Anomalies – These can be a single point or a subsequence, but what makes them anomalous is their relationship to other points in the series, or even to other related series. These anomalies usually occur when an established pattern in a time series is violated by a point or subsequence.
- Real World Example: An amusement park tracks daily visitors and usually has about 30,000 visitors on weekdays and 80,000 visitors on weekends. The park records 60,000 visitors on a Tuesday. This isn’t an unusually large value when compared to many weekend days, but it is unusual for a weekday. This makes it a contextual anomaly.

- Simple Graphical Example: The red point highlighted in the time series plot below takes on similar values to many other points in the time series, but it breaks the established sinusoidal pattern of the series, making it anomalous in context.

Evaluating Anomaly Detection Models – Range-Based Precision and Recall

Precision and recall are traditionally used to evaluate the effectiveness of anomaly detection algorithms, but they have limitations when working with time series data containing subsequence anomalies.

Precision is the proportion of real anomalies found out of all detected anomalies.
- True Positives / (True Positives + False Positives)
Recall is the proportion of real anomalies detected out of all the real anomalies.
- True Positives / (True Positives + False Negatives)

High values of precision and recall indicate an effective anomaly detection algorithm, but there is usually a tradeoff between precision and recall when working with real data. An anomaly detection model with high precision will mostly detect anomalies (it won’t detect a lot of fake anomalies), but it might miss a lot of real anomalies depending on the recall. A model with high recall will find most of the real anomalies in the data, but it might also detect a lot of fake anomalies depending on the precision. Many anomaly detection algorithms provide some kind of continuous anomaly score where a threshold can be set to detect anomalies. Changing this threshold can help find a good balance between precision and recall. Usually increasing the threshold will increase the precision (reduce the number of false positives), while decreasing the threshold will increase the recall (increase the number of anomalies detected).

The basic definitions of precision and recall don’t include any consideration of anomalies that span more than a single data point (subsequence anomalies). We can instead use range-based precision and recall which calculates overlap between real anomaly ranges and predicted anomaly ranges. The basic idea is to calculate precision/recall values for each ‘anomaly range’ and then add up these values across the series to get overall values for precision and recall. This approach balances existence (detecting any portion of the anomalous range) with size, position, and cardinality (detecting the right length and alignment of the anomalous range).

The details of calculating range-based precision and recall are a bit complex, so let’s look at an example of manually calculating recall from a toy anomaly detection model containing both point and subsequence anomalies. We start with a simulated dataset containing 3 point anomalies and 2 subsequence anomalies and we use a simple threshold value to detect anomalies in the data (this isn’t an effective algorithm for anomaly detection, but it will help us learn about calculating range-based recall).

/*import the simulated time series dataset with fake anomalies*/
proc import datafile="path_to_data/simulated_timeseries_with_anomalies.csv"
        out=simTS
        dbms=csv
        replace;
run;

/*plot the data to inspect the anomalies*/
proc sgplot data=simTS;
    series x=time y=value;
    xaxis grid;
    yaxis grid;
run;

/*note the following "True" anomalies in the data:
  Point Anomalies at time=20, time=100, and time=150
  Subsequence Anomalies at time=(60-70) and time=(170-180)*/
data toy_anomaly_detection;
    set simTS;
    detected_anomaly=0;
    true_anomaly=0;
    if value > 2 then detected_anomaly=1;
    if value < -2 then detected_anomaly=1;
    if (time=20 or time=100 or time=150) then true_anomaly=1;
    if ((60 <= time <= 70) or (170 <= time <= 180)) then true_anomaly=1;
run;

title "Detected Anomalies";
proc sgplot data=toy_anomaly_detection;
    styleattrs  DATACOLORS=(verylightgrey lightred) 
            DATALINEPATTERNS=(solid dot);
    block x=time block=detected_anomaly / transparency=0.75;
    series x=time y=value;
    xaxis grid;
    yaxis grid;
run;

title "True Anomalies";
proc sgplot data=toy_anomaly_detection;
    styleattrs  DATACOLORS=(verylightgrey lightgreen) 
            DATALINEPATTERNS=(solid dot);
    block x=time block=true_anomaly / transparency=0.75;
    series x=time y=value;
    xaxis grid;
    yaxis grid;
run;

Evidently our anomaly detection algorithm does not capture all the ‘real’ anomalies in the data. Now we calculate range-based recall for this ‘algorithm’. Precision is calculated similarly but does not include a term for the existence reward and uses the definition of precision to calculate the overlap reward. Note that the SAS DATA Step code below is an inefficient way to calculate the range-based precision, but it illustrates the core concepts. We also make some simplifying assumptions, setting a cardinality factor to 1 and a bias factor to 1 (basically we ignore some tunable parameters that allow us to adjust range-based precision). For more details please see the paper on range-based precision and recall in the references.

/*now we calculate the range-based recall*/
/*first we identify detected vs real anomalies*/
data range_based_recall;
    set toy_anomaly_detection;
    if _N_ = 1 then anomaly_count = 0;
    if true_anomaly = 0 then anomaly_id = 0;
    if (true_anomaly=1 and lag(true_anomaly)=0) then anomaly_count+1;
    if true_anomaly = 1 then anomaly_id = anomaly_count;
    call symputx('num_anomalies',anomaly_count);
run;

/*next we calculate the assessment*/
data range_based_recall;  
    set range_based_recall;
    /*recall includes rewards for existence and overlap*/
    /*alpha determines the balance between existence and overalp in the calculation*/
    alpha = 0.5;
    existence_reward = 0;
    overlap_set = 0;
    retain max_overlap;
    retain total_recall 0;
    do i=1 to &num_anomalies;
        if i=anomaly_id then do;
            if detected_anomaly=true_anomaly then do;
                existence_reward = 1;
                overlap_set = anomaly_count;
            end;
        end;    
    end;
    /*calculate overlap reward*/
    if (anomaly_id ^=0 and lag(anomaly_id) = 0) then max_overlap = 1;
    if (anomaly_id = 0 and lag(anomaly_id) ^= 0) then max_overlap = 0;
    if (anomaly_id ^=0 and lag(anomaly_id) ^= 0) then max_overlap+1;
    if (overlap_set ^=0 and lag(overlap_set) = 0) then overlap = 1;
    if (overlap_set = 0 and lag(overlap_set) ^= 0) then overlap = 0;
    if (overlap_set ^=0 and lag(overlap_set) ^= 0) then overlap+1;
    if max_overlap ^= 0 then overlap_reward = overlap / max_overlap;
    else overlap_reward = 0;
    recall = alpha*existence_reward + (1-alpha)*overlap_reward;
run;

data range_based_recall;
    set range_based_recall;
    where anomaly_id ^= 0;
    by anomaly_id;
    if last.anomaly_id then do;
        final_recall = recall;
    end;
run;

proc sql;
    create table recall as
    select sum(final_recall) as total
    from range_based_recall;
quit;

data recall;
    set recall;
    keep recall;
    recall = total / &num_anomalies;
run;

proc print data=recall;
run;

This yields a range-based recall value of 0.7636. If you are interested in using range-based precision and recall to evaluate time series anomaly detection models you can also use the Python package “PRTS” (see references) to calculate it for you and avoid the manual calculation above.

Now that we have established what kinds of anomalies we are looking for and how we will judge the performance of the models we use to find them, our next step is to explore some time series anomaly detection algorithms. This will be the focus of the next post in this series, focused on detailing the different broad approaches used to detect anomalies in time series.

References:

Previous Posts on Anomaly Detection for Sequence Data:
Useful External Links and Literature:
- Wu, R., & Keogh, E. J. (2021). Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE transactions on knowledge and data engineering, 35(3), 2421-2429.
  - Link to archive of data
- Tatbul, N., Lee, T. J., Zdonik, S., Alam, M., & Gottschlich, J. (2018). Precision and recall for time series. Advances in neural information processing systems, 31.
- Schmidl, S., Wenig, P., & Papenbrock, T. (2022). Anomaly detection in time series: a comprehensive evaluation. Proceedings of the VLDB Endowment, 15(9), 1779-1797.
- Python package “PRTS”, used to calculate range-based precision and recall

Find more articles from SAS Global Enablement and Learning here.

Introduction to Time Series Anomaly Detection

Registration is open

SAS AI and Machine Learning Courses