Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure

Use of Machine Learning Algorithms for Anomaly
Detection in Particles Accelerators Technical Infrastructure
Lorenzo Giusti
lor.giusti@icloud.com
23/04/2020 1

Problem
2
• We want to exploit upcoming failures among the accelerators’ technical infrastructure
• In the current situation, a device is going to be checked if its internal temperature rises
over a certain threshold level (37.5 °C)
• The monitoring framework is constantly supervised by the engineers, which have
thousands of devices to take care and sometimes they can miss one alarm out of
millions received everyday
• Thermal runaways cannot be exploited by human supervision, leading to a repentine
increasing of the internal temperature which leads to an explosion
23/04/2020

Motivation
3
• Devices are not flawless:
• Reliability and performances decreases with time
• Faults can suddenly put out of services and impact overall availability
• Preventive and corrective maintenance is expensive and impact operating time
• Predictive maintenance and anticipated interventions improve:
• Reduce risks
• Decreases downtime
• Increase reliability and overall availability
23/04/2020

Approaches
4
• Mahalonobis Distance:
• Measures the distance between a point and a distribution
• If the distance is above a certain threshold, the point is considered
anomalous
• Isolation Forest:
• Split the regions “randomly” until a point is isolated
• The probability of that point being non-anomalous is proportional
of the number of splits
• Residual Autoregressive Score:
• Compute the norm between the actual time series and the one
predicted with an autoregressive model (eg. AIRMA)
• If the gradient of the norm is a monotonic increasing function, then
the time series is classified as anomalous
23/04/2020

Solution
5
• Machine Learning based anomaly detection:
• Real time monitoring with unsupervised detection
• Device independent with generalized algorithms
• Independent of environmental and periodic operational conditions
• State of the art artificial intelligence algorithms
• Faults are predicted with significant lead time (i.e. day to weeks before failure)
• Different type and thresholds of anomalies are detected
23/04/2020

Failure Analysis Framework Architecture
6
Noisy temperature signal as input Extract, Load and Transform Engineered features as output
• Our framework is developed using only the significant features of the devices
• i.e. only the temperature sensors for the collimators
• Extract Load Transform Pipeline:
• Identify and remove seasonal components from the signal
• Filter out the environmental noise (eg. Gaussian Smoothing)
• Derive additional features in order to gain more insights on physical phenomena of
interest
• At the end, we homogenize the extracted data range and variability
23/04/2020

Neural Networks
7
• Neural Networks are computing systems that are inspired by the biological
brain.
• They learn to perform tasks by considering examples, without being
programmed with task-specific rules
• The original goal of the neural approach was to solve problems in the same
way that a human brain would, now the attention moved to performing specific
tasks
McCulloch, W.S., Pitts, W., (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics.
23/04/2020

Recurrent Neural Networks
8
• In handling generic sequences of data, simple feed forward neural networks
have very tough limitations, especially for:
• Sequences with variable lengths (e.g. environmental processes)
• Sequences with long-term dependencies
• Recurrent neural networks (RNNs) are a class of neural networks naturally
able to exhibit temporal dynamic behavior
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, (1986), “Learning internal representations by error propagation”.
• RNNs are susceptible to vanishing gradients in processing long sequences
23/04/2020

Recurrent Neural Networks
9
• Sharing parameters across the models allow us to extends the model to
samples of different forms and generalize across them:
• The output is produced using the same update rule applied to the previous
outputs
• RNNs introduce the concept of cycles in the computational graph:
• Cycles model the influence of the value at time t on the value at time t+ 𝜏
I. Goodfellow, Y. Bengio, and A. Courville, (2015), “Deep Learning”.
• The hidden state at time t can be considered as a summary of all the previous
values processed by the RNN
23/04/2020

LSTM Networks
10
• The forget gate’s vector of activations allows the network to better control the
gradients values, preventing eventual vanishings of it
• Well-suited to classifying, processing and making predictions based on time
series, since there can be lags of unknown duration between important events
in a time series
• LSTM Networks are a type of recurrent models that have been used in many
sequence learning tasks like speech recognition and time series forecasting
Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". In Neural Computation.
23/04/2020

Bidirectional Networks
11
• Bidirectional LSTM Networks are made by putting two independent
networks together
• The input sequence is given in normal time order for one network, and in
reversed time order for the other one.
• This structure allows the networks to have both backward and forward
information about the sequence at every time step
M. Schuster and K. K. Paliwal, (1997) ,”Bidirectional recurrent neural networks" in IEEE Transactions on Signal Processing.
23/04/2020

Autoencoders
12
• The principal aim is to learn a representation for a set of data, by training the
network to ignore signal noise
• By learning to replicate the most salient features in the model is encouraged to
learn how to precisely reproduce the most frequent characteristics:
• When facing anomalies, the model should fail the reconstruction
• Reconstruction error of a data point, is used as an anomaly score to detect
anomalies
Kramer, M.A., (1991), “Nonlinear principal component analysis using autoassociative neural networks”, AIChE J.
23/04/2020

Autoencoders
13
• Formally, Autoencoders are feedforward neural networks whose aim is
learning to copy its input to its output: ℐ(𝐱) = 𝐱
• As it is, the identity function itself isn’t so useful, unless we force the model to
prioritize which aspects of the input should be copied, leading the model to
learn useful properties of the input data
• The main idea is to separate the identity function in two parts: ℐ(𝐱) =
𝑔(𝑓(𝐱))
• 𝐡 = 𝑓(𝐱) is the encoder function, which maps the input to an internal
representation or code
• 𝐫 = 𝑔(𝐡) is the decoder function, which produces a reconstruction of
the input as a function of the code taken from the previous mapping• One way to capture the most salient features of the input, the encoder/decoder
functions must be of the form:
𝑓: 𝑅 𝑛
→ 𝑅 𝑚
, 𝑚 < 𝑛
𝑔: 𝑅 𝑚
→ 𝑅 𝑛
, 𝑚 < 𝑛
23/04/2020

Autoencoders
14
• An Autoencoder whose code dimension is less than the input dimension is
called undercomplete
• The learning process of an undercomplete autoencoder is described as:
𝐖∗
= arg𝑚𝑖𝑛
𝐖
𝐿(𝐱, 𝑔(𝑓(𝐱)) = arg𝑚𝑖𝑛
𝐖
𝐱 − 𝑔(𝑓(𝐱)) 2
23/04/2020

Autoencoders
15
• Using proper activation functions lead the autoencoder to learn a
generalization of the principal component analysis under a kernel function:
• If the activations of the neurons are linear and the loss function is the
mean squared error, it can be shown that the encoder function is just an
approximation of the principal component analysis technique
• There are a lot of fancy variants of the undercomplete autoencoders:
• Regularized Autoencoders:
• The loss function has additional terms which force the model to have other
properties besides the ability to reproduce the identity function
• Sparse Autoencoders:
• The loss function has an additional sparsity penalty 𝛀(𝐡) (usually the L1 reg.)
which induce sparsity in the code
• Denoising Autoencoders:
• Before the training phase, the training data is altered by some form of noise:
• The model is more stable and robust to the induced corruption
• Variational Autoencoders:
• Make variational inference on the distribution of the training data ☠
23/04/2020

How to detect extreme rare events
16
• Bidirectional Long Short Term Memory Autoencoder overcome with a modelling
approach:
• Learns and reconstruct the nominal behaviour of a time series
• Use the signal versus reconstruction error to detect anomalies
• Devices normal behaviour is often affected by external factors or variables which are not
evident by analysing signals behaviour with time due to:
• Unmonitored or unknown environmental conditions
• Additional harsh conditions that add noise, i.e. radiation dose
• Measurements and data acquisition errors, i.e. the difference between the measured value of a quantity and
its true value
23/04/2020

Extreme rare events detection
17
Bidirectional LSTM LSTM
Autoencoder
Bidirectional LSTM
Autoencoder
23/04/2020

Anomalies as Outliers in the Reconstruction Error Distribution
18
• The model learns how to encode the normal behavior of our devices
• We set a threshold as an extreme value in the distribution of the reconstruction
errors on the data we claim have no anomalies
• Points with the reconstruction error over the threshold are anomalies
Normal Behavior Abnormal Behavior
• Subsequent anomalies trigger a critical alarm on the monitoring framework
23/04/2020

Results
19
No anomalies detected Anomaly detected on 06-19-2019
Temp. over threshold on 07-23-2019Temp. never goes over the crit. level
23/04/2020

Conclusions & Future developments
20
• With this techniques we have shown that is possible to predict well in advance
if a device deviates from its nominal behavior thus predicting a potential fault
• Is also possible to assess the criticality of the detected anomaly
• We aim to generalize the framework to a multi-system anomaly detector, for
the following systems:
• Uninterruptible Power Supplies and more generally batteries✔
• Collimators ✔
• Electrical Transformers ✘
• Hydraulic pumps ✘
• Compressors ✘
• Future features will also be added to infer the type of anomaly and the extent
(sensor, component, sub-system, system)
23/04/2020

Refereces
21
Zhang, C. et al., (2018), “A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis
in Multivariate Time Series Data“, arXiv:1811.08055v1.
Marchi, E. et al., (2015), “A Novel Approach for Automatic Acoustic Novelty Detection Using a
Denoising Autoencoder with Bidirectional LSTM Neural Networks”, ICASSP.
Sakurada, M., Yairi, T., (2014), “Anomaly Detection Using Autoencoders with Nonlinear
Dimensionality Reduction”, ACM.
Zhou, C., Paffenroth, R. C., (2017), “Anomaly Detection with Robust Deep Autoencoders”, KDD.
Malhotra P., et al., (2016), “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML.
Gong, D. et al., (2019) “Memorizing Normality to Detect Anomaly: Memory-augmented Deep
Autoencoder for Unsupervised Anomaly Detection”, arXiv:1904.02639v2.
Majid S. alDosari, (2016) “Unsupervised Anomaly Detection in Sequences Using Long Short Term
Memory Recurrent Neural Networks”.
23/04/2020

Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure

More Related Content

What's hot (20)

Similar to Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure (20)

More from Deep Learning Italia (20)

Recently uploaded (20)

Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure