Perceiving potential problems proactively in complex environments can be perplexing. There may be hundreds of elements that need to be monitored in a system that you have to manage. How do you know which variables are important to track? How do you know which variables may be responsible for causing a system to change from a normal condition to an abnormal condition?
What if you could use a technique to track many disparate continuous numeric variables all together as if they were a single consolidated unit. Would you like to know if that unit starts to change in some way? The variables of interest might come from sensor readings in chemical processes, manufacturing environments, complex electrical or mechanical componentry, engines, medical or financial systems.
Sometimes you need to compare more than just apples and oranges in a complex system.
This post introduces the MTS (Mahalanobis-Taguchi) system as a fault detection technique for multivariate data. The MTS system became available starting with the SAS Viya Stable Release 2025.02 in the form of new action sets and procedures in SAS Visual Forecasting. MTS is also supported within SAS Event Stream Processing Studio (ESP) for real-time anomaly detection.
Multivariate Anomaly Detection Technique
The MTS System combines two statistical methods - Mahalanobis distance (MD) and Taguchi orthogonal arrays within the MTS and MTSSCORE procedures. (MTS; Taguchi, Chowdhury, and Wu 2000)
The MTS System is a statistical method used for pattern recognition, equipment health monitoring, fault analysis and diagnostics of multivariate data. All variables must be continuous and trend-free. Observations with missing values are excluded from training and from scoring. Note: A maximum of 447 variables can be included.
The MTS has three phases:
First, proc MTS runs the training process using continuous data representing the normal fault-free operations of a system. This produces a range of Mahalanobis Distance (MD) values for the observed normal operations which are used to establish an MD threshold.
Then, proc MTSSCORE runs scoring and diagnostics for new data using the model created in the training step by proc MTS. The scoring computes an MD value for each new observation. If the computed MD value for the scored observation exceeds the specified MD threshold, the observation is flagged as an outlier.
Diagnostics compute the importance of each variable as a signal-to-noise ratio, or gain, to quantify the importance of each variable for an abnormal condition by using a design-of-experiments approach. This helps identify the root cause of each detected fault. (see this doc for formulas and more details)
Statistical methods
The Mahalanobis Distance can be seen as a multivariate generalization of a standard z score which shows the distance, in standard deviations, of point x from the center of the distribution. It takes the positive definite correlation matrix into account to define the “Mahalanobis space ”.
For more details, see: SAS Help Center: Mahalanobis Distance and
What is Mahalanobis distance? - The DO Loop by Rick Wicklin
Taguchi’s orthogonal arrays are used to define a set of cases, called "runs" or "experiments," where one or more variables are excluded from the observation. Orthogonal arrays are computed by selecting specific configurations from standard array tables used to create a matrix in which each column represents a factor and each row represents a combination of factor levels. This ensures that the effect of each factor can be isolated, allowing for accurate signal-to-noise-ratio computation and precise gain analysis. Proc MTSSCORE computes a measure of importance called “gain” for each variable.
More details and formulas for this technique can be found here SAS Help Center: Computing Variable Importance Using Orthogonal Arrays
Training a model
Proc MTS uses multivariate data representing the normal ‘fault-free’ operations in the environment you are analyzing to train the model and establish the Mahalanobis space of the normal observations. This space contains the vector of means and the inverse covariance matrix used to find the range of MD values representing normal operational conditions. As stated earlier, the procedure can also output normalized MD values to make the scale independent of the number of variables in the system. This is used to choose an MD threshold that exceeds the maximum MD value for normal operations. This recommended value of the MD threshold is between 3 and 4.
Example Scenario
In this example, we train a model using simulated data for jet engines running in a fault-free operating state (the first 100 observations). Each engine in the training data has data from 24 different sensors.
The ID variable datetime and BY-GROUP variable engine contain data from an individual flight segment (or cycle) for a specific engine.
The simulated data continues to run the engines (from observation 101) until failure. The data source: (Saxena and Goebel 2008)
Proc MTS trains the model which is stored with the savestate statement. The following code was run in SAS Studio. Refer to SAS Help Center: Syntax: MTS Procedure for descriptions of the other statements.
PROC MTS data=CASUSER.CENSORtrain;
input x1-x24 ;
by engine;
id datetime;
output out=casuser.out
mean=casuser.mean
cov=casuser.cov
stat=casuser.stat
scoreinfo=casuser.scoreinfo;
savestate rstore=casuser.model;
run;
The chart below is generated by the procedure. Each point represents the normalized Mahalanobis distance measure for a cycle or snapshot of measurements for engine=123 under normal operating conditions. All observations for the first 100 cycles fit well under the shaded default MD threshold of 3.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Next, we use the trained model to score observations in cycles beyond the first 100 used for training. The results in the Monitoring chart below show that outliers start to occur for this engine around cycle 250 and the MD value gets progressively worse as the engine continues to operate. The next chart is used to identify when the system is out of the normal range of operations for engine=123.
The bottom chart (MTS Diagnostics Gains) shows the Gains values for each of the 24 input variables in a heatmap. The Taguchi orthogonal array process computes variable importance. This chart is used to identify what variables contribute to the fault conditions. Notice that as the cycles advance down along the y-axis, the gain intensity increases for inputs X9 and X15 based on the changing colors of the heatmap. Variable X9 represents TotBypassPressure and variable X15 represents FuelPressureRatio. Note: Variables in this chart are ordered by decreasing overall computed gains for convenience.
The code used to create the previous 2 charts is shown below. Casuser.model is the model trained by Proc MTS.
PROC MTSSCORE data=casuser.censorscore model=casuser.model;
by engine;
output out=casuser.outscore gains=casuser.outgains stat=casuser.outstats;
run;
Now let’s look at another engine. The following example is from engine=135. Notice that during the normal operating conditions for this engine there are two outliers just above the MD threshold of 3. If these anomalies were due to a faulty sensor that was subsequently replaced, these observations could be removed from the training data using the optional outlier statement in proc MTS. For further details, see SAS Help Center: OUTLIER Statement. The Support Vector Data Description (SVDD) technique is used to identify outliers for removal from training data as needed since data is not always clean.
The scoring data for engine=135 used in PROC MTSSCORE highlights cycles that exceed the MD threshold value as visible in the first chart below. The second MTS Diagnostics Gains chart shows that input X9 has been identified as a major factor in this fault condition starting at cycle 116 as indicated by the red sections in the heatmap. This provides useful insights useful for ongoing fault diagnosis and engine maintenance.
The Mahalanobis Taguchi system is a powerful diagnostic technique providing valuable insights for many scientific and business applications. The Mahalanobis distance metric trained on an overall system operating in a normal fault-free state represents the benchmark or expected environment. Comparing readings from a current operational instance to the benchmark automatically identifies events that deviate from the normal operating conditions that exceed the MD threshold.
Further analysis of the data accomplished while scoring the current operational state with the benchmark model produces a ‘gain’ metric for each input variable. Comparing values of the gain metric identifies input variables contributing to the anomalous condition, thereby providing insight into the cause of the anomaly.
In my next post, I will describe how to use the MTS System with streaming event data that is processed in real time by the SAS Event Stream Processing (ESP) application.
And what’s wrong with comparing apples and oranges anyway?
Find more articles from SAS Global Enablement and Learning here.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.