Have you ever been asked to find the ‘smoking gun’ for a high-cost critical asset failure? What about identifying predictive asset degradation patterns? Were you using high-frequency multivariate data, making the exercise feel more like you were looking for a ‘smoking needle’ in a haystack? I have. It wasn’t easy.
Fortunately SAS’ Advanced Analytics R&D group released several new machine learning algorithms in 17W12 designed for condition monitoring and anomaly detection of high-frequency multivariate data. SAS’ new Support Vector Data Description (SVDD) procedure will speed up your ‘smoking gun’ investigation efforts by generating one output model instead of dozens, if not hundreds, typically created. As of May 23rd, you can export the resulting SVDD score code directly into SAS Event Stream Processing (ESP) to easily detect and alert these system anomalies in near real-time.
Traditional Asset Degradation Investigation Approaches
So you can appreciate how slick the new SVDD procedure is, let’s consider typical approaches for condition monitoring and anomaly detection for high-frequency multivariate data.
The data showcased in this blog is NASA’s 2008 Prognostics and Health Management Challenge Data Set (PHM08) simulating turbofan engine degradation. The data contains 26 variables including: engine ID, cycle number, three operational settings, and 21 sensor measurements. The data contains 218 different engines with end of useful life ranging between 128 and 357 engine cycles. The training data contains the first 25% of engine cycle sensor measurements for 30 randomly sampled engines. The validation data contains all data for the remaining 188 engines. We assume that this data represents normal operating conditions.
Typically analysts start with visual and descriptive modeling approaches to perform condition monitoring and anomaly detection. I like to use JMP for initial explorations. The following video showcases common JMP tools used to explore and identify asset degradation.
proc sql noprint;
select value into :nsv
where Description = "Number of Support Vectors";
select value into :radius
where Description = "Threshold R^2 Value";
select value into :time
where Description = "Run Time (Seconds)";
insert into work.summary
values(&s, &nsv, &radius, &time);
/* Identify optimal bandwidth parameter value */
proc sort data=work.summary;
label d_r="First Derivative" d2_r="Second Derivative" radius="SVDD Radius";
if _n_ > 1 then d_r = (radius - lag(radius)) / (&sby);
if _n_ > 2 then d2_r = (radius - 2*lag(radius) + lag2(radius)) / (&sby*&sby);
if _n_ > 1 then do;
if d2_r > 0 and lag(d2_r) < 0 then flag=1;
else if d2_r < 0 and lag(d2_r) > 0 then flag=1;
else flag = 0;
/* Save optimal bandwidth parameter value as &optimal */
select min(s) into :optimal
where flag = 1;
/* Save threshold radius associated with optimal bandwidth parameter value as &threshold */
select radius into :threshold
where s = &optimal;
/* Plot the values of the first and second derivative of radius = f[s] */
proc sgplot data=work.summarize;
title H=14pt "Optimal Bandwidth Parameter Value";
footnote H=8pt j=l italic "Optimal Bandwidth Parameter when radius function's second derivative crosses zero.";
series x=s y=d2_r / lineattrs=(color=blue thickness=2) y2axis;
series x=s y=radius / lineattrs=(color=purple pattern=dash);
series x=s y=d_r / lineattrs=(color=orange pattern=dash) y2axis;
refline 0 /axis=y2 label="BW Threshold" lineattrs=(color=black) labelloc=inside;
refline &optimal / axis=x lineattrs=(color=black) labelloc=inside;
inset "Optimal BW value = &optimal" / border position=bottomleft;
y2axis label="SVDD Radius First & Second Derivate Values";
yaxis label="SVDD Radius Values";
xaxis label="Bandwidth Parameter Values";
The following plot shows the Gaussian bandwidth parameter value (s) on x-axis versus SVDD radius values tested on the y1-axis, and first & second derivate SVDD radius values on the y2-axis. We select the value of s where the second derivative SVDD radius function value equals zero.
Now you need to update your SVDD model using the optimal bandwidth parameter value.
proc svdd data=casuser.data_std nthreads=4;
input &vars / level=interval;
where partition = &training_flg;
kernel rbf / bw=&optimal;
Now you can score and plot your updated SVDD model using validation data.
score data=casuser.data_p out=casuser.all_out rstore=casuser.state_s;
proc sgplot data=casuser.all_out;
title H=14pt "Anomaly Detection using SVDD";
footnote H=8pt j=l italic "Anomalies when SVDD distance exceed SVDD Radius Threshold.";
series x=&xaxis_var y=_SVDDdistance_;
refline &threshold / label="SVDD Radius Threshold" lineattrs=(color=red) labelpos=max;
where engine = 5;
Here is an example of a single SVDD output chart needed to investigate asset degradation. SVDD Distance (_SVDDdistance_) is one of two SVDD scoring output variables. The other is _SVDDscore_. Plotting SVDD Distance against engine cycle, or another time equivalent variable, is an easy way to perform conditioning monitoring to track asset degradation.
NOTE: In this example, we found better results were achieved by not standardizing the multivariate data prior to running SVDD. However, in other applications, you may want to try standardizing multi-scaled data to achieve better results. Please refer to my blog <INTERNAL LINK> "To standardize data or not to standardize data – that is the question" for more information about standardizing data, including sample CAS action code.
As illustrated above, SVDD quickly and easily generated a single output chart that effectively alerted turbofan engine asset degradation. As of May 23rd, you can export the resulting SVDD score code directly into SAS ESP to alert system anomalies and identify asset degradation in near real-time.
We also tested the SVDD approach on high-frequency multivariate data from a chemical manufacturing process. In order to train your SVDD model, you must know how to define "normal" operation. We recommend working with the customer's subject matter expert to get this information. Since we could not define an exact period of time with stable "normal" operation for the chemical plant data, we increased the SVDD outlier fraction parameter to 0.01. SVDD's outlier fraction parameter is inversely proportional to the penalty function controlling the trade-off between hypersphere volume and modeling accuracy. For our test, this worked extremely well. However, in other applications, please consult with the customer's subject matter expert for their recommendations. Here is the resulting SVDD output chart which clearly detected unstable system operation prior to the chemical manufacturing process' unplanned system disturbances on June 7th and June 12th.
Support Vector Data Description (SVDD) is a new machine learning algorithm well suited for performing condition monitoring and anomaly detection for high-frequency multivariate data. The algorithm is now available in SAS Visual Analytics Data Mining and Machine Learning 8.1 on SAS Viya 3.2. It is extremely easy to use. It generates one output that quickly and easily detect system anomalies and can be easily integrated into SAS ESP.
Looking for more information about SVDD? Please take a look at our NEW VLE workshop: Visual Statistics 8.1 | Visual Data Mining and Machine Learning 8.1 on SAS Viya 3.2: Essentials which includes information about SVDD, including a hands-on lab.
I would like to say a special thank you to Ryan Gillespie, Dev Kakde, Arin Chaudhuri, Gul Ege, Byron Biggs and Anya McGuirk for their technical assistance developing the code included in this blog and confirming optimal SVDD configurations.