SlideShare a Scribd company logo
Use of Machine Learning Algorithms for Anomaly
Detection in Particles Accelerators Technical Infrastructure
Lorenzo Giusti
lor.giusti@icloud.com
23/04/2020 1
Problem
2
• We want to exploit upcoming failures among the accelerators’ technical infrastructure
• In the current situation, a device is going to be checked if its internal temperature rises
over a certain threshold level (37.5 °C)
• The monitoring framework is constantly supervised by the engineers, which have
thousands of devices to take care and sometimes they can miss one alarm out of
millions received everyday
• Thermal runaways cannot be exploited by human supervision, leading to a repentine
increasing of the internal temperature which leads to an explosion
23/04/2020
Motivation
3
• Devices are not flawless:
• Reliability and performances decreases with time
• Faults can suddenly put out of services and impact overall availability
• Preventive and corrective maintenance is expensive and impact operating time
• Predictive maintenance and anticipated interventions improve:
• Reduce risks
• Decreases downtime
• Increase reliability and overall availability
23/04/2020
Approaches
4
• Mahalonobis Distance:
• Measures the distance between a point and a distribution
• If the distance is above a certain threshold, the point is considered
anomalous
• Isolation Forest:
• Split the regions “randomly” until a point is isolated
• The probability of that point being non-anomalous is proportional
of the number of splits
• Residual Autoregressive Score:
• Compute the norm between the actual time series and the one
predicted with an autoregressive model (eg. AIRMA)
• If the gradient of the norm is a monotonic increasing function, then
the time series is classified as anomalous
23/04/2020
Solution
5
• Machine Learning based anomaly detection:
• Real time monitoring with unsupervised detection
• Device independent with generalized algorithms
• Independent of environmental and periodic operational conditions
• State of the art artificial intelligence algorithms
• Faults are predicted with significant lead time (i.e. day to weeks before failure)
• Different type and thresholds of anomalies are detected
23/04/2020
Failure Analysis Framework Architecture
6
Noisy temperature signal as input Extract, Load and Transform Engineered features as output
• Our framework is developed using only the significant features of the devices
• i.e. only the temperature sensors for the collimators
• Extract Load Transform Pipeline:
• Identify and remove seasonal components from the signal
• Filter out the environmental noise (eg. Gaussian Smoothing)
• Derive additional features in order to gain more insights on physical phenomena of
interest
• At the end, we homogenize the extracted data range and variability
23/04/2020
Neural Networks
7
• Neural Networks are computing systems that are inspired by the biological
brain.
• They learn to perform tasks by considering examples, without being
programmed with task-specific rules
• The original goal of the neural approach was to solve problems in the same
way that a human brain would, now the attention moved to performing specific
tasks
McCulloch, W.S., Pitts, W., (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics.
23/04/2020
Recurrent Neural Networks
8
• In handling generic sequences of data, simple feed forward neural networks
have very tough limitations, especially for:
• Sequences with variable lengths (e.g. environmental processes)
• Sequences with long-term dependencies
• Recurrent neural networks (RNNs) are a class of neural networks naturally
able to exhibit temporal dynamic behavior
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, (1986), “Learning internal representations by error propagation”.
• RNNs are susceptible to vanishing gradients in processing long sequences
23/04/2020
Recurrent Neural Networks
9
• Sharing parameters across the models allow us to extends the model to
samples of different forms and generalize across them:
• The output is produced using the same update rule applied to the previous
outputs
• RNNs introduce the concept of cycles in the computational graph:
• Cycles model the influence of the value at time t on the value at time t+ 𝜏
I. Goodfellow, Y. Bengio, and A. Courville, (2015), “Deep Learning”.
• The hidden state at time t can be considered as a summary of all the previous
values processed by the RNN
23/04/2020
LSTM Networks
10
• The forget gate’s vector of activations allows the network to better control the
gradients values, preventing eventual vanishings of it
• Well-suited to classifying, processing and making predictions based on time
series, since there can be lags of unknown duration between important events
in a time series
• LSTM Networks are a type of recurrent models that have been used in many
sequence learning tasks like speech recognition and time series forecasting
Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". In Neural Computation.
23/04/2020
Bidirectional Networks
11
• Bidirectional LSTM Networks are made by putting two independent
networks together
• The input sequence is given in normal time order for one network, and in
reversed time order for the other one.
• This structure allows the networks to have both backward and forward
information about the sequence at every time step
M. Schuster and K. K. Paliwal, (1997) ,”Bidirectional recurrent neural networks" in IEEE Transactions on Signal Processing.
23/04/2020
Autoencoders
12
• The principal aim is to learn a representation for a set of data, by training the
network to ignore signal noise
• By learning to replicate the most salient features in the model is encouraged to
learn how to precisely reproduce the most frequent characteristics:
• When facing anomalies, the model should fail the reconstruction
• Reconstruction error of a data point, is used as an anomaly score to detect
anomalies
Kramer, M.A., (1991), “Nonlinear principal component analysis using autoassociative neural networks”, AIChE J.
23/04/2020
Autoencoders
13
• Formally, Autoencoders are feedforward neural networks whose aim is
learning to copy its input to its output: ℐ(𝐱) = 𝐱
• As it is, the identity function itself isn’t so useful, unless we force the model to
prioritize which aspects of the input should be copied, leading the model to
learn useful properties of the input data
• The main idea is to separate the identity function in two parts: ℐ(𝐱) =
𝑔(𝑓(𝐱))
• 𝐡 = 𝑓(𝐱) is the encoder function, which maps the input to an internal
representation or code
• 𝐫 = 𝑔(𝐡) is the decoder function, which produces a reconstruction of
the input as a function of the code taken from the previous mapping• One way to capture the most salient features of the input, the encoder/decoder
functions must be of the form:
𝑓: 𝑅 𝑛
→ 𝑅 𝑚
, 𝑚 < 𝑛
𝑔: 𝑅 𝑚
→ 𝑅 𝑛
, 𝑚 < 𝑛
23/04/2020
Autoencoders
14
• An Autoencoder whose code dimension is less than the input dimension is
called undercomplete
• The learning process of an undercomplete autoencoder is described as:
𝐖∗
= arg𝑚𝑖𝑛
𝐖
𝐿(𝐱, 𝑔(𝑓(𝐱)) = arg𝑚𝑖𝑛
𝐖
𝐱 − 𝑔(𝑓(𝐱)) 2
23/04/2020
Autoencoders
15
• Using proper activation functions lead the autoencoder to learn a
generalization of the principal component analysis under a kernel function:
• If the activations of the neurons are linear and the loss function is the
mean squared error, it can be shown that the encoder function is just an
approximation of the principal component analysis technique
• There are a lot of fancy variants of the undercomplete autoencoders:
• Regularized Autoencoders:
• The loss function has additional terms which force the model to have other
properties besides the ability to reproduce the identity function
• Sparse Autoencoders:
• The loss function has an additional sparsity penalty 𝛀(𝐡) (usually the L1 reg.)
which induce sparsity in the code
• Denoising Autoencoders:
• Before the training phase, the training data is altered by some form of noise:
• The model is more stable and robust to the induced corruption
• Variational Autoencoders:
• Make variational inference on the distribution of the training data ☠
23/04/2020
How to detect extreme rare events
16
• Bidirectional Long Short Term Memory Autoencoder overcome with a modelling
approach:
• Learns and reconstruct the nominal behaviour of a time series
• Use the signal versus reconstruction error to detect anomalies
• Devices normal behaviour is often affected by external factors or variables which are not
evident by analysing signals behaviour with time due to:
• Unmonitored or unknown environmental conditions
• Additional harsh conditions that add noise, i.e. radiation dose
• Measurements and data acquisition errors, i.e. the difference between the measured value of a quantity and
its true value
23/04/2020
Extreme rare events detection
17
Bidirectional LSTM LSTM
Autoencoder
Bidirectional LSTM
Autoencoder
23/04/2020
Anomalies as Outliers in the Reconstruction Error Distribution
18
• The model learns how to encode the normal behavior of our devices
• We set a threshold as an extreme value in the distribution of the reconstruction
errors on the data we claim have no anomalies
• Points with the reconstruction error over the threshold are anomalies
Normal Behavior Abnormal Behavior
• Subsequent anomalies trigger a critical alarm on the monitoring framework
23/04/2020
Results
19
No anomalies detected Anomaly detected on 06-19-2019
Temp. over threshold on 07-23-2019Temp. never goes over the crit. level
23/04/2020
Conclusions & Future developments
20
• With this techniques we have shown that is possible to predict well in advance
if a device deviates from its nominal behavior thus predicting a potential fault
• Is also possible to assess the criticality of the detected anomaly
• We aim to generalize the framework to a multi-system anomaly detector, for
the following systems:
• Uninterruptible Power Supplies and more generally batteries✔
• Collimators ✔
• Electrical Transformers ✘
• Hydraulic pumps ✘
• Compressors ✘
• Future features will also be added to infer the type of anomaly and the extent
(sensor, component, sub-system, system)
23/04/2020
Refereces
21
Zhang, C. et al., (2018), “A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis
in Multivariate Time Series Data“, arXiv:1811.08055v1.
Marchi, E. et al., (2015), “A Novel Approach for Automatic Acoustic Novelty Detection Using a
Denoising Autoencoder with Bidirectional LSTM Neural Networks”, ICASSP.
Sakurada, M., Yairi, T., (2014), “Anomaly Detection Using Autoencoders with Nonlinear
Dimensionality Reduction”, ACM.
Zhou, C., Paffenroth, R. C., (2017), “Anomaly Detection with Robust Deep Autoencoders”, KDD.
Malhotra P., et al., (2016), “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML.
Gong, D. et al., (2019) “Memorizing Normality to Detect Anomaly: Memory-augmented Deep
Autoencoder for Unsupervised Anomaly Detection”, arXiv:1904.02639v2.
Majid S. alDosari, (2016) “Unsupervised Anomaly Detection in Sequences Using Long Short Term
Memory Recurrent Neural Networks”.
23/04/2020
23/04/2020 22
Thank you!

More Related Content

What's hot (20)

PDF
Sensors and Actuators Integration in Embedded Systems
IDES Editor
 
PDF
Agilent LED Test Solutions
Interlatin
 
PPTX
artiicial intelligence in power system
pratikguptateddy
 
PDF
02 s r agents
Tianlu Wang
 
PDF
Chap 2 standards and organization
LenchoDuguma
 
PDF
Development of a software solution for solar pv power systems sizing and moni...
simeon Matthew
 
PDF
Sensor Organism project presentation
Naums Mogers
 
PDF
Wearable Gait Classification Using STM Sensortile
Shayan Mamaghani
 
PPT
Airtificial Intelligence in Power System
Pratik Doshi
 
PDF
Epma 010
Lecturer
 
PPT
Lecture 07 mechatronic design concepts
DrSKazi
 
PDF
Lect02
Abhishek Gupta
 
PDF
⭐⭐⭐⭐⭐ Charla FIEC: #SSVEP_EEG Signal Classification based on #Emotiv EPOC #BC...
Victor Asanza
 
PDF
D0542130
IOSR Journals
 
PDF
FPGA Debug Using Incremental Trace Buffer
paperpublications3
 
PDF
⭐⭐⭐⭐⭐ IX Jornadas Académicas y I Congreso Científico de Ciencias e Ingeniería...
Victor Asanza
 
PDF
H028038042
researchinventy
 
PPTX
Artificial Intelligence in Power Systems
manogna gwen
 
PDF
A Survey on Wireless Network Simulators
journalBEEI
 
PDF
Education set for collecting and visualizing data using sensor system based ...
IJMER
 
Sensors and Actuators Integration in Embedded Systems
IDES Editor
 
Agilent LED Test Solutions
Interlatin
 
artiicial intelligence in power system
pratikguptateddy
 
02 s r agents
Tianlu Wang
 
Chap 2 standards and organization
LenchoDuguma
 
Development of a software solution for solar pv power systems sizing and moni...
simeon Matthew
 
Sensor Organism project presentation
Naums Mogers
 
Wearable Gait Classification Using STM Sensortile
Shayan Mamaghani
 
Airtificial Intelligence in Power System
Pratik Doshi
 
Epma 010
Lecturer
 
Lecture 07 mechatronic design concepts
DrSKazi
 
⭐⭐⭐⭐⭐ Charla FIEC: #SSVEP_EEG Signal Classification based on #Emotiv EPOC #BC...
Victor Asanza
 
D0542130
IOSR Journals
 
FPGA Debug Using Incremental Trace Buffer
paperpublications3
 
⭐⭐⭐⭐⭐ IX Jornadas Académicas y I Congreso Científico de Ciencias e Ingeniería...
Victor Asanza
 
H028038042
researchinventy
 
Artificial Intelligence in Power Systems
manogna gwen
 
A Survey on Wireless Network Simulators
journalBEEI
 
Education set for collecting and visualizing data using sensor system based ...
IJMER
 

Similar to Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure (20)

PDF
Anomaly Detection using Deep Auto-Encoders
Gianmario Spacagna
 
PPTX
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Data Science Milan
 
PDF
Geometric Processing of Data in Neural Networks
Lorenzo Cassani
 
PPTX
Analytics forward 2019-03
Scott Gerard
 
PPTX
swatiVCprsentation artificial learning and machine learning.pptx
pooja71445
 
PDF
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Altoros
 
PDF
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
PDF
Autoencoders
CloudxLab
 
PDF
Deep Learning for Time Series Data
Arun Kejariwal
 
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
PPTX
1. Introduction to deep learning.pptx
Omer Tariq
 
PDF
Deep learning: the final frontier for time series analysis and signal process...
Alex Honchar
 
PDF
Kn2518431847
IJERA Editor
 
PDF
Kn2518431847
IJERA Editor
 
PDF
Deep Learning Frameworks slides
Sheamus McGovern
 
PDF
Deep learning frameworks v0.40
Jessica Willis
 
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
PPTX
HiPEAC2022_António Casimiro presentation
VEDLIoT Project
 
PDF
Deep learning unit 3 artificial neural network
ArulRaksha
 
PDF
Deep learning architectures
Joe li
 
Anomaly Detection using Deep Auto-Encoders
Gianmario Spacagna
 
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
Data Science Milan
 
Geometric Processing of Data in Neural Networks
Lorenzo Cassani
 
Analytics forward 2019-03
Scott Gerard
 
swatiVCprsentation artificial learning and machine learning.pptx
pooja71445
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Altoros
 
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
Autoencoders
CloudxLab
 
Deep Learning for Time Series Data
Arun Kejariwal
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
1. Introduction to deep learning.pptx
Omer Tariq
 
Deep learning: the final frontier for time series analysis and signal process...
Alex Honchar
 
Kn2518431847
IJERA Editor
 
Kn2518431847
IJERA Editor
 
Deep Learning Frameworks slides
Sheamus McGovern
 
Deep learning frameworks v0.40
Jessica Willis
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
HiPEAC2022_António Casimiro presentation
VEDLIoT Project
 
Deep learning unit 3 artificial neural network
ArulRaksha
 
Deep learning architectures
Joe li
 
Ad

More from Deep Learning Italia (20)

PDF
Machine Learning driven Quantum Optimization for Marketing
Deep Learning Italia
 
PDF
Modelli linguistici da Eliza a ChatGPT P roblemi , fraintendimenti e prospettive
Deep Learning Italia
 
PPTX
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
PDF
Meetup Luglio - Operations Research.pdf
Deep Learning Italia
 
PDF
Meetup Giugno - c-ResUNET.pdf
Deep Learning Italia
 
PDF
MEETUP Maggio - Team Automata
Deep Learning Italia
 
PDF
MEETUP APRILE - Ganomaly - Anomaly Detection.pdf
Deep Learning Italia
 
PPTX
2022_Meetup_Mazza-Marzo.pptx
Deep Learning Italia
 
PDF
Machine Learning Security
Deep Learning Italia
 
PDF
The science of can and can t e la computazione quantistica
Deep Learning Italia
 
PDF
Dli meetup moccia
Deep Learning Italia
 
PDF
Pi school-dli-presentation de nobili
Deep Learning Italia
 
PDF
Machine Learning Explanations: LIME framework
Deep Learning Italia
 
PDF
Explanation methods for Artificial Intelligence Models
Deep Learning Italia
 
PPTX
Use Cases Machine Learning for Healthcare
Deep Learning Italia
 
PDF
NLG, Training, Inference & Evaluation
Deep Learning Italia
 
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
PDF
Towards quantum machine learning calogero zarbo - meet up
Deep Learning Italia
 
PPTX
Macaluso antonio meetup dli 2020-12-15
Deep Learning Italia
 
PDF
Data privacy e anonymization in R
Deep Learning Italia
 
Machine Learning driven Quantum Optimization for Marketing
Deep Learning Italia
 
Modelli linguistici da Eliza a ChatGPT P roblemi , fraintendimenti e prospettive
Deep Learning Italia
 
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
Meetup Luglio - Operations Research.pdf
Deep Learning Italia
 
Meetup Giugno - c-ResUNET.pdf
Deep Learning Italia
 
MEETUP Maggio - Team Automata
Deep Learning Italia
 
MEETUP APRILE - Ganomaly - Anomaly Detection.pdf
Deep Learning Italia
 
2022_Meetup_Mazza-Marzo.pptx
Deep Learning Italia
 
Machine Learning Security
Deep Learning Italia
 
The science of can and can t e la computazione quantistica
Deep Learning Italia
 
Dli meetup moccia
Deep Learning Italia
 
Pi school-dli-presentation de nobili
Deep Learning Italia
 
Machine Learning Explanations: LIME framework
Deep Learning Italia
 
Explanation methods for Artificial Intelligence Models
Deep Learning Italia
 
Use Cases Machine Learning for Healthcare
Deep Learning Italia
 
NLG, Training, Inference & Evaluation
Deep Learning Italia
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Towards quantum machine learning calogero zarbo - meet up
Deep Learning Italia
 
Macaluso antonio meetup dli 2020-12-15
Deep Learning Italia
 
Data privacy e anonymization in R
Deep Learning Italia
 
Ad

Recently uploaded (20)

PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PPTX
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
PPTX
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
 
PPTX
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
PPTX
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Kafka Use Cases Real-World Applications
Accentfuture
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Smart_Workplace_Assistant_Presentation (1).pptx
kiccha1703
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
Starbucks in the Indian market through its joint venture.
sales480687
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 

Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure

  • 1. Use of Machine Learning Algorithms for Anomaly Detection in Particles Accelerators Technical Infrastructure Lorenzo Giusti [email protected] 23/04/2020 1
  • 2. Problem 2 • We want to exploit upcoming failures among the accelerators’ technical infrastructure • In the current situation, a device is going to be checked if its internal temperature rises over a certain threshold level (37.5 °C) • The monitoring framework is constantly supervised by the engineers, which have thousands of devices to take care and sometimes they can miss one alarm out of millions received everyday • Thermal runaways cannot be exploited by human supervision, leading to a repentine increasing of the internal temperature which leads to an explosion 23/04/2020
  • 3. Motivation 3 • Devices are not flawless: • Reliability and performances decreases with time • Faults can suddenly put out of services and impact overall availability • Preventive and corrective maintenance is expensive and impact operating time • Predictive maintenance and anticipated interventions improve: • Reduce risks • Decreases downtime • Increase reliability and overall availability 23/04/2020
  • 4. Approaches 4 • Mahalonobis Distance: • Measures the distance between a point and a distribution • If the distance is above a certain threshold, the point is considered anomalous • Isolation Forest: • Split the regions “randomly” until a point is isolated • The probability of that point being non-anomalous is proportional of the number of splits • Residual Autoregressive Score: • Compute the norm between the actual time series and the one predicted with an autoregressive model (eg. AIRMA) • If the gradient of the norm is a monotonic increasing function, then the time series is classified as anomalous 23/04/2020
  • 5. Solution 5 • Machine Learning based anomaly detection: • Real time monitoring with unsupervised detection • Device independent with generalized algorithms • Independent of environmental and periodic operational conditions • State of the art artificial intelligence algorithms • Faults are predicted with significant lead time (i.e. day to weeks before failure) • Different type and thresholds of anomalies are detected 23/04/2020
  • 6. Failure Analysis Framework Architecture 6 Noisy temperature signal as input Extract, Load and Transform Engineered features as output • Our framework is developed using only the significant features of the devices • i.e. only the temperature sensors for the collimators • Extract Load Transform Pipeline: • Identify and remove seasonal components from the signal • Filter out the environmental noise (eg. Gaussian Smoothing) • Derive additional features in order to gain more insights on physical phenomena of interest • At the end, we homogenize the extracted data range and variability 23/04/2020
  • 7. Neural Networks 7 • Neural Networks are computing systems that are inspired by the biological brain. • They learn to perform tasks by considering examples, without being programmed with task-specific rules • The original goal of the neural approach was to solve problems in the same way that a human brain would, now the attention moved to performing specific tasks McCulloch, W.S., Pitts, W., (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics. 23/04/2020
  • 8. Recurrent Neural Networks 8 • In handling generic sequences of data, simple feed forward neural networks have very tough limitations, especially for: • Sequences with variable lengths (e.g. environmental processes) • Sequences with long-term dependencies • Recurrent neural networks (RNNs) are a class of neural networks naturally able to exhibit temporal dynamic behavior D. E. Rumelhart, G. E. Hinton, and R. J. Williams, (1986), “Learning internal representations by error propagation”. • RNNs are susceptible to vanishing gradients in processing long sequences 23/04/2020
  • 9. Recurrent Neural Networks 9 • Sharing parameters across the models allow us to extends the model to samples of different forms and generalize across them: • The output is produced using the same update rule applied to the previous outputs • RNNs introduce the concept of cycles in the computational graph: • Cycles model the influence of the value at time t on the value at time t+ 𝜏 I. Goodfellow, Y. Bengio, and A. Courville, (2015), “Deep Learning”. • The hidden state at time t can be considered as a summary of all the previous values processed by the RNN 23/04/2020
  • 10. LSTM Networks 10 • The forget gate’s vector of activations allows the network to better control the gradients values, preventing eventual vanishings of it • Well-suited to classifying, processing and making predictions based on time series, since there can be lags of unknown duration between important events in a time series • LSTM Networks are a type of recurrent models that have been used in many sequence learning tasks like speech recognition and time series forecasting Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". In Neural Computation. 23/04/2020
  • 11. Bidirectional Networks 11 • Bidirectional LSTM Networks are made by putting two independent networks together • The input sequence is given in normal time order for one network, and in reversed time order for the other one. • This structure allows the networks to have both backward and forward information about the sequence at every time step M. Schuster and K. K. Paliwal, (1997) ,”Bidirectional recurrent neural networks" in IEEE Transactions on Signal Processing. 23/04/2020
  • 12. Autoencoders 12 • The principal aim is to learn a representation for a set of data, by training the network to ignore signal noise • By learning to replicate the most salient features in the model is encouraged to learn how to precisely reproduce the most frequent characteristics: • When facing anomalies, the model should fail the reconstruction • Reconstruction error of a data point, is used as an anomaly score to detect anomalies Kramer, M.A., (1991), “Nonlinear principal component analysis using autoassociative neural networks”, AIChE J. 23/04/2020
  • 13. Autoencoders 13 • Formally, Autoencoders are feedforward neural networks whose aim is learning to copy its input to its output: ℐ(𝐱) = 𝐱 • As it is, the identity function itself isn’t so useful, unless we force the model to prioritize which aspects of the input should be copied, leading the model to learn useful properties of the input data • The main idea is to separate the identity function in two parts: ℐ(𝐱) = 𝑔(𝑓(𝐱)) • 𝐡 = 𝑓(𝐱) is the encoder function, which maps the input to an internal representation or code • 𝐫 = 𝑔(𝐡) is the decoder function, which produces a reconstruction of the input as a function of the code taken from the previous mapping• One way to capture the most salient features of the input, the encoder/decoder functions must be of the form: 𝑓: 𝑅 𝑛 → 𝑅 𝑚 , 𝑚 < 𝑛 𝑔: 𝑅 𝑚 → 𝑅 𝑛 , 𝑚 < 𝑛 23/04/2020
  • 14. Autoencoders 14 • An Autoencoder whose code dimension is less than the input dimension is called undercomplete • The learning process of an undercomplete autoencoder is described as: 𝐖∗ = arg𝑚𝑖𝑛 𝐖 𝐿(𝐱, 𝑔(𝑓(𝐱)) = arg𝑚𝑖𝑛 𝐖 𝐱 − 𝑔(𝑓(𝐱)) 2 23/04/2020
  • 15. Autoencoders 15 • Using proper activation functions lead the autoencoder to learn a generalization of the principal component analysis under a kernel function: • If the activations of the neurons are linear and the loss function is the mean squared error, it can be shown that the encoder function is just an approximation of the principal component analysis technique • There are a lot of fancy variants of the undercomplete autoencoders: • Regularized Autoencoders: • The loss function has additional terms which force the model to have other properties besides the ability to reproduce the identity function • Sparse Autoencoders: • The loss function has an additional sparsity penalty 𝛀(𝐡) (usually the L1 reg.) which induce sparsity in the code • Denoising Autoencoders: • Before the training phase, the training data is altered by some form of noise: • The model is more stable and robust to the induced corruption • Variational Autoencoders: • Make variational inference on the distribution of the training data ☠ 23/04/2020
  • 16. How to detect extreme rare events 16 • Bidirectional Long Short Term Memory Autoencoder overcome with a modelling approach: • Learns and reconstruct the nominal behaviour of a time series • Use the signal versus reconstruction error to detect anomalies • Devices normal behaviour is often affected by external factors or variables which are not evident by analysing signals behaviour with time due to: • Unmonitored or unknown environmental conditions • Additional harsh conditions that add noise, i.e. radiation dose • Measurements and data acquisition errors, i.e. the difference between the measured value of a quantity and its true value 23/04/2020
  • 17. Extreme rare events detection 17 Bidirectional LSTM LSTM Autoencoder Bidirectional LSTM Autoencoder 23/04/2020
  • 18. Anomalies as Outliers in the Reconstruction Error Distribution 18 • The model learns how to encode the normal behavior of our devices • We set a threshold as an extreme value in the distribution of the reconstruction errors on the data we claim have no anomalies • Points with the reconstruction error over the threshold are anomalies Normal Behavior Abnormal Behavior • Subsequent anomalies trigger a critical alarm on the monitoring framework 23/04/2020
  • 19. Results 19 No anomalies detected Anomaly detected on 06-19-2019 Temp. over threshold on 07-23-2019Temp. never goes over the crit. level 23/04/2020
  • 20. Conclusions & Future developments 20 • With this techniques we have shown that is possible to predict well in advance if a device deviates from its nominal behavior thus predicting a potential fault • Is also possible to assess the criticality of the detected anomaly • We aim to generalize the framework to a multi-system anomaly detector, for the following systems: • Uninterruptible Power Supplies and more generally batteries✔ • Collimators ✔ • Electrical Transformers ✘ • Hydraulic pumps ✘ • Compressors ✘ • Future features will also be added to infer the type of anomaly and the extent (sensor, component, sub-system, system) 23/04/2020
  • 21. Refereces 21 Zhang, C. et al., (2018), “A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data“, arXiv:1811.08055v1. Marchi, E. et al., (2015), “A Novel Approach for Automatic Acoustic Novelty Detection Using a Denoising Autoencoder with Bidirectional LSTM Neural Networks”, ICASSP. Sakurada, M., Yairi, T., (2014), “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction”, ACM. Zhou, C., Paffenroth, R. C., (2017), “Anomaly Detection with Robust Deep Autoencoders”, KDD. Malhotra P., et al., (2016), “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection”, ICML. Gong, D. et al., (2019) “Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection”, arXiv:1904.02639v2. Majid S. alDosari, (2016) “Unsupervised Anomaly Detection in Sequences Using Long Short Term Memory Recurrent Neural Networks”. 23/04/2020