With SAS ESP you can use unsupervised models to detect changes in real time in the conditions of a chemical process.
The approach of using PCA and Change Detection (CD) enables to monitor the complex relationships of a high dimensional data stream (meaning with a lot of sensors) by reducing the dimension of the data stream into a principal component that contains most of the variance and monitoring that principal component through a change detection algorithm called the Kullback-Leibler (KL) divergence. In the image below, you can see how approachable it is to design an SAS ESP project that contains the PCA and the CD algorithm.
In the video I demonstrate how to configure SAS Event Stream Processing to execute this analysis on the Tennessee Eastman Process.
While PCA is usually part of the usual toolkit of data scientists, combining it with the KL divergence is very practical because it compares histograms of the distribution of the value of the principal component. In the figure below, you can see the angle of the first Principal Component which is generated by the moving window PCA in SAS ESP on the first 500 observations (the reference period) and then on the whole dataset 1500 observations (with a fault occuring around the 600th observation). While it is very easy to interpret the graphics due to the important drop in the PCA absolute angle, it is important to note that the change in value is very small and impossible to estimate a-priori.
When using the change detection algorithm, SAS ESP constructs a uniform histogram on a reference period (the first 500observations) and compares the distribution of that histogram with the whole period. In the implementation, we use a shorter reference period in order to ensure that only the normal process is represented in the histogram. For illustration purpose, I created this visual comparing 4 bins on the absolute angle of the first principal component. You will notice that rapidly after the reference period, most of the observations are in the 4th bin. The change detection algorithm will capture this change.
When you don't have prior knowledge of the expected values of the sensors or of the PCA, the KL divergence offers a standard metric that can be known and decided a-priori by the user. It is not easy to choose the optimal decision boundary of the KL divergence. When the number of bins is fixed to a small number and because the reference distribution is always a uniform distribution, you can visualize how much deviation is acceptable.
In the example below, if you don't want to detect a change smaller than the one the left, but you want to detect a change similar or larger than the one on the right, you can set the threshold for the KL value between 0.0537 and 0.5402.
You can play with different distributions and bin numbers to choose the optimal boundary for your use case. I have included the sas code that enable to calculate the KL divergence value and create the graphs above.
Tom.
For more information about kullback-leibler divergence : https://blogs.sas.com/content/iml/2020/05/26/kullback-leibler-divergence-discrete.html
... View more