With SAS ESP you can use unsupervised models to detect changes in real time in the conditions of a chemical process.
The approach of using PCA and Change Detection (CD) enables to monitor the complex relationships of a high dimensional data stream (meaning with a lot of sensors) by reducing the dimension of the data stream into a principal component that contains most of the variance and monitoring that principal component through a change detection algorithm called the Kullback-Leibler (KL) divergence. In the image below, you can see how approachable it is to design an SAS ESP project that contains the PCA and the CD algorithm.
In the video I demonstrate how to configure SAS Event Stream Processing to execute this analysis on the Tennessee Eastman Process.
While PCA is usually part of the usual toolkit of data scientists, combining it with the KL divergence is very practical because it compares histograms of the distribution of the value of the principal component. In the figure below, you can see the angle of the first Principal Component which is generated by the moving window PCA in SAS ESP on the first 500 observations (the reference period) and then on the whole dataset 1500 observations (with a fault occuring around the 600th observation). While it is very easy to interpret the graphics due to the important drop in the PCA absolute angle, it is important to note that the change in value is very small and impossible to estimate a-priori.
When using the change detection algorithm, SAS ESP constructs a uniform histogram on a reference period (the first 500observations) and compares the distribution of that histogram with the whole period. In the implementation, we use a shorter reference period in order to ensure that only the normal process is represented in the histogram. For illustration purpose, I created this visual comparing 4 bins on the absolute angle of the first principal component. You will notice that rapidly after the reference period, most of the observations are in the 4th bin. The change detection algorithm will capture this change.
When you don't have prior knowledge of the expected values of the sensors or of the PCA, the KL divergence offers a standard metric that can be known and decided a-priori by the user. It is not easy to choose the optimal decision boundary of the KL divergence. When the number of bins is fixed to a small number and because the reference distribution is always a uniform distribution, you can visualize how much deviation is acceptable.
In the example below, if you don't want to detect a change smaller than the one the left, but you want to detect a change similar or larger than the one on the right, you can set the threshold for the KL value between 0.0537 and 0.5402.
You can play with different distributions and bin numbers to choose the optimal boundary for your use case. I have included the sas code that enable to calculate the KL divergence value and create the graphs above.
Tom.
For more information about kullback-leibler divergence : https://blogs.sas.com/content/iml/2020/05/26/kullback-leibler-divergence-discrete.html
Interesting. I can see how this could be very useful in chemical manufacturing, where I used to work. I used PCA many times on chemical data to detect changes, but not with the Change Detection part of the analysis. We used Shewhart charts on the PCA scores to detect changes.
Now I work in the world of banking, we have similar multivariate streams of data, which of course change over time. Yet the default methodology to detect changes is a univariate, single-time point analysis (actually comparing current time point to some baseline time point) called a Population Stability Index. So, the multivariate nature of the data is not used, and the time series nature of the data is also ignored (so if the data is trending in a certain direction, no signal is found until it hits a threshold, doesn't matter if it is trending in that direction for 12 consecutive time periods, there is no signal from this tool).
Seems like this PCA and Change Detection fits perfectly in the banking case and would give a much more powerful analysis tool. The problem would be to overcome strong reluctance to use a new methodology in place of — or along side of — a well known industry standard. An additional concern might be that auditors and regulators may not want to see new methodology being used for these decisions, I don't know. If I ever get a chance to use PCA and Change Detection on banking data, I will give it a try.
Indeed, let me know how this works out for you, and if you have questions, please shout!!
Cheers
Tom.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.