BookmarkSubscribeRSS Feed

Real-Time Detection of Anomalies with Short-Time Fourier Transform using SAS Event Stream Processing

Started ‎12-18-2023 by
Modified ‎12-18-2023 by
Views 488

The purpose of this post is to learn how to detect anomalous frequencies in streaming signal data using SAS Event Stream Processing. We learn how to implement the streaming Short-Time Fourier Transform algorithm using the Calculate Window, and we also learn how to explore and interpret the results of the streaming algorithm. Illustrative screenshots of the Event Stream Processing Studio graphical interface are included throughout the post, and a full XML Project containing all the work described in this post is provided at the end as a reference.

 

Deploying Models in SAS Event Stream Processing

 

SAS Event Stream Processing allows us to analyze real-time data streams and immediately react to patterns or information in the data stream. This is useful when working on anomaly detection problems because we usually want to take some action after finding anomalies, and our time to take this action can be limited. If we detect an anomaly in the manufacturing line, we may want to shut down the line before damage to the products starts to accumulate. If we react too late to the anomaly we may lose a lot of our production to damage, so the faster we can identify the anomaly and shut down the manufacturing line, the more profitable our business will be. SAS Event Stream Processing provides the ability to deploy an anomaly detection model in real-time, so we can go immediately from streaming in data to detecting anomalies and reacting to them. We can even use SAS Event Stream Processing to send a signal to the manufacturing line to shut down immediately upon detection of an anomaly (assuming we can interact with the manufacturing line via an API). Previous posts on the topic of anomaly detection provide some background for the concepts and algorithms discussed in this post. In particular, the posts linked below introduce some basic approaches to anomaly detection and the Short-Time Fourier Transform algorithm:

 

 

Here we will be focused on using SAS Event Stream Processing software to implement the Short-Time Fourier Transform on streaming signal data.

 

Real-Time Detection of Anomalies using Short-Time Fourier Transforms

 

The short-time Fourier transform allows us to decompose signals into frequency components over time. This decomposition can reveal when unwanted frequency components appear in the data. A common example of this is the unwanted appearance of high-frequency noise in the data. This could be the result of unwanted vibrations in mechanical devices, or noise coupling in circuits. Regardless of how the noise enters the system, we want to identify when the noise appears and flag the times where the signal contains the unwanted noise. We may also want to use digital filters to remove the unwanted noise, but for now we will focus on detecting the anomalies.

 

In a previous post (Identifying Anomalous Frequencies in Signal Data with Short-Time Fourier Transform using SAS/IML) we learned about using the STFT in SAS/IML using the SPECTROGRAM CALL method. Here we will follow up on that problem and deploy our anomaly detection method in real-time using SAS Event Stream Processing. As a reminder we had a 30 Hz signal with high-frequency noise added for a very short time, and we want to detect when this high frequency noise appears. The spectrogram we plotted for this signal revealed the high-frequency noise was localized around 200 Hz right after the 1 second mark in the signal:

 

01_arziti_noiseSTFT_1.png

Select any image to see a larger version.
Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.

 

The patch of orange points around 200 Hz after 1.0 seconds reveals the anomalous frequency added to the data. We can characterize the anomaly as any frequency component in the data above about 100 Hz (accounting for spectral leakage from the normal 30 Hz signal) with non-negligible power. Looking at the power spectrum it looks like above 100 Hz the log(Power) values are all below -4 except for the anomalous frequency components. We can choose a threshold value for our anomaly detection model, identifying any frequency component above 100 Hz with log(Power) greater than -4 as an anomaly. The basic idea is that the 30 Hz signal is the normal data, and anything that deviates significantly from the normal data (like the burst of 200 Hz noise we see in the spectrogram) is an anomaly. Our goal is to output real-time information about when the anomaly occurs in the data and at which frequencies it occurs (knowing that in this example frequencies below 100 Hz are non-anomalous).

 

We start by creating a project in SAS Event Stream Processing Studio and adding some relevant windows to the project. As we walk through the project, we will see screenshots of portions of the project in SAS Event Stream Processing Studio. This is included as a reference to help readers learn to use SAS Event Stream Processing Studio, but it is not provided as a method to reproduce the steps in this post. SAS Event Stream Processing Studio projects are based on XML, so behind the scenes in the graphical interface, XML is created to represent the project. We have provided the full XML for each of the examples discussed in the post at the end in the Reference Materials section. To explore these projects on your own, you can import the XML into SAS Event Stream Processing Studio and navigate the project using the graphical interface.

 

02_arziti_ESPPipelineSTFT.png

 

The first window in the project (readDigitalSignal) is Source Window and is used to read the input data. In real project this would read data from a sensor device or a website that is designed to output streaming data, but in this toy example we read the signal data from a CSV file. We use the File/Socket Connector to read the CSV file into the source window, and the source window will output the information read from the CSV file, which in this case is just the time variable t, and the input signal, x_in. The output schema for the source window shows that the time variable t is used as the Key Field for the streaming project.

 

03_arziti_ESPSourceSchema.png

 

The second window in the project (shortTimeFourierTransform) is a Calculate Window and is used to perform the STFT on the streaming data. The calculate window will use the input streaming data (in this case the time variable t and the input signal, x_in) and the STFT algorithm to calculate the power across frequency bins in each time window. The STFT settings like the window length/type and the overlap between windows can be set using the SAS Event Stream Processing Studio graphical interface (hardcoded), or they can be set by referring to an input configuration file living on the operating system. In this example we hardcode the STFT parameter settings:

 

04_arziti_STFTWindowSettings.png

 

The advantage of using a configuration file is the ability to modify settings without altering the project, allowing the same anomaly detection project to be reused in different settings and environments. The calculate window will output the power and phase in each frequency bin for each time window. Notice in the output schema that the combination of the time and bin fields is used as the Key Field:

 

05_arziti_STFTOutputSchema.png

 

A key requirement for using the STFT algorithm in SAS Event Stream Processing is to specify a map between the STFT algorithm and the input/output fields in the schema. The input map is simple, we just specify that the field t is used as the time ID variable and the field x_in is used as the input signal:

 

06_arziti_STFTInputMap.png

 

For the output map we must specify where we store the STFT results, but this time it can be a bit confusing since there are two options for how we store the outputs, a list format output, and a non-list format output. We use the non-list format output in this example:

 

07_arziti_STFTOutputMap.png

 

In the output map above, we leave the keyOut Role blank because we use a combination of the timeIdOut and the binOut variables as the key variable (as we saw in the output schema earlier). We also leave the powerListOut and phaseListOut Roles blank because we are not using the list format. This is important since we will get a confusing error if we try to run the pipeline with all the output roles specified. Using the Output Map above we generate an output event for each time/bin combination, meaning with 32 bins we will get 32 output events for each time point. The list format instead outputs a single event for each time point, with a list of power and phase values for each frequency bin. This is a minor difference in format, but when configuring the SAS Event Stream Processing project, it is important to choose a single output format and stick with it to avoid errors and confusion.

 

The third window in the project (binToFreq) is a Compute Window used to convert the numeric frequency bins (just an integer number counting the bins from 1 to 32 in this example) into actual frequencies in Hertz. We do this by multiplying the bin number by the sampling frequency and then dividing by the total number of bins in the schema. This creates a new field in the data containing the frequencies in Hertz.

 

The fourth and final window in the project (noiseDetection) is just a Filter Window that filters the incoming stream to only output events when anomalous signals are detected, in this case it detects the presence of high frequency noise (signal components above 100 Hz with power greater than 0.01). The output schema for the filter window is the same as the input schema, we just filter out events based on the expression specified below in the filter window (notice how we use the field freq created by the compute window as a filter variable):

 

08_arziti_ESPFilter.png

 

In this simple example we just filter the noise out from the signal using rules we learned from examining the spectrogram, but we could also use more sophisticated tools for finding peaks in the spectrogram to identify anomalous frequencies. SAS Event Stream Processing has a peak finding algorithm in the Calculate Window that can be used to look for peaks in plots like the spectrogram and then output the time at which these peaks occurred and the signal amplitude at the peak. In a real deployment we would want to output this detected anomaly to another downstream system, either by sending an API call or message to the downstream system, or by writing output data (like a CSV file) that is consumed by a downstream system.

 

We can test the SAS Event Stream Processing project by saving it and then selecting Enter Test Mode in SAS Event Stream Processing Studio. This allows us to see how the anomaly detection pipeline will work using the simulated signal data, and it will allow us to test our project to make sure all the settings work. Once in test mode we can run a test deployment to see how the CSV data looks when it is streamed through the pipeline:

 

09_arziti_ESPTest_readDigitalSignal.png

 

We can see that the source window reads in the data from the CSV file and assigns an Opcode to each observation in the data, creating events in SAS Event Stream Processing.

 

10_arziti_ESPTest_shortTimeFourierTransform.png

 

The Calculate window outputs the STFT results, which in this case is an event for each combination of time and frequency bin (if we were using the list format for the STFT algorithm we would get different results). Notice that the power in each frequency bin is output in a linear scale, normally when we plot spectrogram results, we use a log or decibel scale for spectral power. This is something to keep in mind when comparing the output of the STFT algorithm in SAS Event Stream Processing to the output we would see when using the call spectrogram routine in SAS/IML.

 

11_arziti_ESPTest_noiseDetection.png

 

The binToFreq Compute Window just converts the bin number to a frequency in Hz, and the noiseDetection Filter Window is used to select only those events that contain unwanted high-frequency noise. We can see that while the previous windows had 1000 rows each, the noiseDetection window only has 22 rows, corresponding to the 22 output events where high frequency noise was detected. This doesn’t mean 22 time points contained high frequency noise, since each time point can have multiple events corresponding to the different frequency bins with unwanted noise. At this point in the project, we could use a subscriber connector to write out the results or to connect to another downstream system that uses the time information in the output to perform some kind of business task. A simple example would be a system where the input signal is voltage measured on a circuit in a wind turbine, and when high frequency voltage is measured in the circuit, we send a signal to the circuit to connect to a ground and discharge the high-frequency voltage. In this example we are trying to protect the turbine generator from the negative effects of unwanted voltage components while still maximizing the uptime of the turbine power generation.

 

This is just one way we can apply the STFT algorithm to input signal data (we can also use SAS/IML and SAS Visual Forecasting procedures), but it is the best way to perform a STFT on streaming data, especially if we need to react to the anomalies in a time-sensitive way. This demonstration does not include the final step of reacting to the anomalies in real-time, but in a situation where we are detecting anomalies in a manufacturing setting, it is usually as simple as sending a shutdown signal to a programmable logic controller (PLC) device on the manufacturing line. This is also just one algorithm we can use to detect anomalies in real-time, a subsequent post will discuss deploying support vector data description models in SAS Event Stream Processing. This differs from the current approach because we will need to deploy a model using a scoring artifact (an ASTORE file) instead of just using an online algorithm in the Calculate Window.

 

Reference Materials - STFT_Anomaly_Detection SAS Event Stream Processing Project XML:

 

<project name="STFT_Anomaly_Detection" threads="1" pubsub="auto" heartbeat-interval="1">
<metadata>
<meta id="studioUploadedBy">student</meta>
<meta id="studioUploaded">1685034102948</meta>
<meta id="studioModifiedBy">student</meta>
<meta id="studioModified">1686236532700</meta>
<meta id="layout">{"cq1":{"binToFreq":{"x":-80,"y":20},"noiseDetection":{"x":-80,"y":155},"readDigitalSignal":{"x":-80,"y":-255},"shortTimeFourierTransform":{"x":-80,"y":-120}}}</meta>
</metadata>
<contqueries>
<contquery name="cq1">
<windows>
<window-source index="pi_EMPTY" insert-only="true" name="readDigitalSignal">
<description><![CDATA[This window reads in the digital signal for analysis in SAS Event Stream Processing. This toy example uses a CSV file, but we will illustrate reading data from a URL as well.]]></description>
<schema>
<fields>
<field name="t" type="int64" key="true"/>
<field name="x_in" type="double"/>
</fields>
</schema>
<connectors>
<connector class="fs" name="Signal_In_CSV">
<properties>
<property name="type"><![CDATA[pub]]></property>
<property name="header"><![CDATA[1]]></property>
<property name="addcsvopcode"><![CDATA[true]]></property>
<property name="addcsvflags"><![CDATA[normal]]></property>
<property name="fsname"><![CDATA[/mnt/data/anomaly/simulated_signal_data_w_noise.csv]]></property>
<property name="fstype"><![CDATA[csv]]></property>
</properties>
</connector>
</connectors>
</window-source>
<window-calculate algorithm="STFT" index="pi_EMPTY" name="shortTimeFourierTransform">
<description><![CDATA[This window takes the input digital signal and performs a short-time Fourier Transform to identify the frequencies in the signal as a function of time.]]></description>
<schema>
<fields>
<field name="time" type="int64" key="true"/>
<field name="bin" type="int64" key="true"/>
<field name="power" type="double"/>
<field name="phase" type="double"/>
</fields>
</schema>
<parameters>
<properties>
<property name="windowLength"><![CDATA[25]]></property>
<property name="windowType"><![CDATA[15]]></property>
<property name="windowParam"><![CDATA[-1.0]]></property>
<property name="fftLength"><![CDATA[64]]></property>
<property name="overlap"><![CDATA[10]]></property>
<property name="binsInSchema"><![CDATA[32]]></property>
</properties>
</parameters>
<input-map>
<properties>
<property name="input"><![CDATA[x_in]]></property>
<property name="timeId"><![CDATA[t]]></property>
</properties>
</input-map>
<output-map>
<properties>
<property name="timeIdOut"><![CDATA[time]]></property>
<property name="binOut"><![CDATA[bin]]></property>
<property name="powerOut"><![CDATA[power]]></property>
<property name="phaseOut"><![CDATA[phase]]></property>
</properties>
</output-map>
<connectors>
<connector class="fs" name="STFT_Write_CSV">
<properties>
<property name="type"><![CDATA[sub]]></property>
<property name="header"><![CDATA[full]]></property>
<property name="snapshot"><![CDATA[false]]></property>
<property name="fsname"><![CDATA[/mnt/data/anomaly/simulated_signal_STFT.csv]]></property>
<property name="fstype"><![CDATA[csv]]></property>
</properties>
</connector>
</connectors>
</window-calculate>
<window-filter index="pi_EMPTY" name="noiseDetection">
<description><![CDATA[This filter window looks for non-negligible power in the high frequency bins. Spectral leakage means power is spread out across many bins so we look for non-negligible power (above 0.01 in the linear scale) in frequency bins above 100 Hz.]]></description>
<expression><![CDATA[(freq > 100) and (power > 0.01)]]></expression>
</window-filter>
<window-compute index="pi_EMPTY" name="binToFreq">
<description><![CDATA[The STFT outputs information about the power in different frequency bins. The actual frequency corresponding to each of these bins is based on the number of bins and the sampling frequency.

frequency = (bin / binsInSchema) * nyquist_frequency]]></description>
<schema>
<fields>
<field name="time" type="int64" key="true"/>
<field name="bin" type="int64" key="true"/>
<field name="power" type="double"/>
<field name="phase" type="double"/>
<field name="freq" type="double"/>
</fields>
</schema>
<output>
<field-expr><![CDATA[power]]></field-expr>
<field-expr><![CDATA[phase]]></field-expr>
<field-expr><![CDATA[(bin * 250) / 32]]></field-expr>
</output>
</window-compute>
</windows>
<edges>
<edge source="readDigitalSignal" target="shortTimeFourierTransform" role="data"/>
<edge source="shortTimeFourierTransform" target="binToFreq"/>
<edge source="binToFreq" target="noiseDetection"/>
</edges>
</contquery>
</contqueries>
</project>

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎12-18-2023 09:36 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags