SlideShare a Scribd company logo
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
DOI: 10.5121/ijbes.2021.8401 1
A DEEP MODEL FOR EEG SEIZURE
DETECTION WITH EXPLAINABLE AI USING
CONNECTIVITY FEATURES
Hmayag Partamian, Fouad Khnaisser, Mohamad Mansour,
Reem Mahmoud, Hazem Hajj and Fadi Karameh
Department of Electrical and Computer Engineering,
American University of Beirut, Beirut, Lebanon
ABSTRACT
During seizures, different types of communication between different parts of the brain are characterized by
many state of the art connectivity measures. We propose to employ a set of undirected (spectral matrix, the
inverse of the spectral matrix, coherence, partial coherence, and phase-locking value) and directed
features (directed coherence, the partial directed coherence) to detect seizures using a deep neural
network. Taking our data as a sequence of ten sub-windows, an optimal deep sequence learning
architecture using attention, CNN, BiLstm, and fully connected neural networks is designed to output the
detection label and the relevance of the features. The relevance is computed using the weights of the model
in the activation values of the receptive fields at a particular layer. The best model resulted in 97.03%
accuracy using balanced MIT-BIH data subset. Finally, an analysis of the relevance of the features is
reported.
KEYWORDS
Seizure detection, deep sequence learning, brain connectivity, explainability, feature relevance.
1. INTRODUCTION
Epilepsy is a neurological disorder that affects around 50 million people of all ages worldwide
[1]. It is characterized by the frequent and repetitive occurrence of seizures that disrupt normal
function and affect the quality of life of the patient.
Neuroimaging techniques acquired from epilepsy patients showed evidence of different structural
and functional irregularities. Synchronous discharges of electrical activity across different parts
of the brain during seizures were captured using electroencephalogram (EEG) data [2]. To extract
meaningful information, a plethora of measures have been designed to characterize the changes
in the EEG signals during epilepsy that can be classified into two types. Univariate metrics
measure the information in a window of one time series and can be classified as temporal,
spectral, and entropy-based measures [3] [4] [5]. On the other hand, multivariate connectivity
metrics [6] [7] [8], characterize information between multiple time series and can be directed
(effective) or undirected (functional). For example, the phase-locking value (PLV) is an
undirected measure since it quantifies the synchrony between a pair of signals [9]. Granger
Causality (GC) calculates the amount of information transfer from one channel to the other which
makes it a directed measure [10] [11] [12]. Cross-frequency coupling (CFC) methods have also
been used in characterizing information between different brain regions by studying the
interaction between oscillations across different frequency bands such as phase-amplitude
coupling (PAM) [13].
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
2
Ictal episodes are those that exhibit seizure activity and will have different connectivity values
when compared to non-ictal periods [14]. Also, seizure onset zones usually get isolated in their
activity before the seizure starts which can be captured using a coherence connectivity matrix
[15]. Electrophysiological research also reports that functional connectivity analysis allows
localization of seizure onset zones (SOZ), whose exact location helps increase surgery success
rates [3]. Besides, variations in phase-locking value (PLV) can characterize the synchronous
activity between different parts of the brain during a seizure and non-seizure episodes [9].
In the last decade, advances in technology have made machine learning and big data analysis
available and many seizure detection algorithms were developed using diverse machine learning
algorithms. Support vector machines (SVM) is one of the common techniques used to learn
classification algorithms for seizure detection [16] [17] [3]. Usually, many classical tailored
features are fed to the SVM classifiers. Therefore, employing unnecessary features may hinder
both the performance and the speed of analysis [18]. Feature selection techniques were proposed
to reduce the feature set since machine learning algorithms often perform better when relevant
features are selected and used for learning [19]. An alternative branch of machine learning is
deep learning (DL) which studies multiple-layer architectures of neural networks that can learn
discriminating features and a classifier simultaneously [20]. The raw EEG data, snapshot images,
different univariate and bivariate measures were used as input to deep classifiers to detect
seizures and non-seizure episodes during epilepsy [21] [22] [23]. However, deep networks are
perceived as “black box” techniques since the role of the different layers and the overall internal
functioning are unknown. A major issue arises because scientists need to understand why such a
decision was made. Can such models be trusted without knowing why they fail and why they
succeed? Explainable artificial intelligence (XAI) [24] techniques try to derive explanations
from the parameters of the deep network to infer knowledge and build explainable features [13].
In epilepsy analysis, seizure detection is an important problem, however, for doctors to be able to
diagnose epilepsy, they need deeper information about the interactions of the brain regions such
as the localization of SOZ which can help identify a resection area for surgery. Experimental
studies have shown the phase-amplitude coupling [13] and the power in the high-frequency band
(>100Hz) [25] increase in the seizure onset zone and during the seizures. The SOZ also gets
disconnected from the rest of the brain regions that can be inferred using undirected connectivity
measures such as PLV [26]. Another study shows that the phase-lag index (PLI) connectivity in
the θ (4-8 Hz) band can be linked to tumor-related epilepsy [27]. Explainability during seizure
analysis were also addressed to find the feature relevance at different frequency bands [28] and
extract information from the learned weights to derive topographic brain maps [28] [29].
In this study, we selected a subset of the different connectivity measures to characterize the brain
signals from different perspectives. spectral matrix (SM), the inverse of the spectral matrix (IS),
coherence (COH), partial coherence (PC), and phase-locking value (PLV) as undirected
measures, and directed coherence (DC), and the partial directed coherence (PDC) as directed
features. We feed the chosen features to learn a deep model that not only classifies seizures but
also computes how much each of the features has participated in the decision made by the
detector using the deep learning model parameters. The weights of the model provide
explainability at the level of the features. This is of key importance for epilepsy analysis since it
provides the user with additional valuable information. Also, seizure events differ from one user
to the other and the model will be generally trained using different patient data that have different
seizure dynamics. The types of epilepsy in the training set can also be different and may exhibit
different connectivity values [30]. Therefore, when new data is tested, the proposed model will
detect whether a specific interval of EEG is a part of a seizure or not and also outputs the
percentage of the impact of each of the employed connectivity measures on the model’s decision.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
3
We summarize our contributions as follows:
 Unlike classical methods where they use a single window, our model considers a 20-
second window with ten 2 second sub-windows to characterize interdependencies
between these windows through time and better describe seizure and non-seizure
intervals of the EEG data.
 The model is designed to perform seizure detection of the 20-second data using attention,
convolutional neural networks (CNN), fully connected layers (FCNN), and bidirectional
long short term memory (BiLstm) methodologies which is a type of a recurrent neural
network (RNN).
 In [37], different fusion methods were employed while building the architecture of the
models. We also employed different fusion methods and compared them to understand
relationships between features and the output.
 Finally, we infer from the weights of the network the relevance of the connectivity
measures on the decision made by the detector. We study the impact of the features on
the output generally across all patients as well as per patients. To the best of our
knowledge, no other work has studied explainability with connectivity analysis during
seizure classification.
The rest of the paper is organized as follows. In section 2, we present all the related work for this
study. In section 3, we explain our methodology providing the different designs and workflows of
the proposed method. In section 4, a series of experiments are conducted to evaluate the
performance of our design. In section 5, we discuss our findings, its limitations, and propose
possible future extensions. We finally conclude in section 6, providing a summary of our work.
2. RELATED WORK
Two tasks arise while working with epileptic data, classifying the sample as ictal or non-ictal or
classifying the window as preictal (the period that precedes an ictal phase) or inter-ictal (period of
normal activity). Even though we will tackle the ictal/non-ictal classification problem, studying
the preictal/inter-ictal problem is useful as it gives us intuition and inspiration to develop our
architecture. The state of the art for preictal and inter-ictal classification is produced by Daoud et
al. [31] where they describe a methodology to train, using the raw multichannel EEG data, a deep
convolutional autoencoder (DCAE) and uses the latent space representation i.e. (output of the
pre-trained encoder) of each recording as input to a BiLstm network that classifies the example
into preictal and inter-ictal. The Dataset used is the CHB-MIT EEG dataset recorded at
Children’s Hospital Boston and is publicly available. To narrow down the channels considered
(23 total channels), an iterative algorithm was used to select the channels. The algorithm
calculates the product between variance and entropy for each channel and iteratively trains the
model on bigger windows until all the channels are considered to select the best combination of
performance accuracy and computational cost. Through this approach, an accuracy of 99.66% is
achieved.
As for ictal/non-ictal prediction, Akbarian et al. [32] use effective brain connectivity, the directed
transfer function (DTF), the directed coherence (DC), and the generalized partial directed
coherence (GPDC) in various frequency bands, to measure the relation between brain regions.
They extracted features from each of these measures using graph theory. They fed each set of
newly extracted features to an autoencoder (AE) for feature reduction then used a softmax on the
output of the encoder, obtaining 3 classifiers. The decision of ictal/non-ictal is then inferred
through majority voting. Through this method and using the CHB-MIT dataset, they achieved a
99.43% accuracy.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
4
Table 1 presents an overview of different architectures and their respective task-specific
accuracies. We also include the features used by each study. When using different features we
need to combine them to feed them to the network, different types of combinations/fusions exist
such as concatenating them at the input layer, or feeding them through separate networks and
concatenating their respective outputs and use them as input to the final output layer. Fusion
methods were applied in EEG with the Schizophrenia detection problem where they showcase the
effect of varying fusion mechanisms on the performance of the deep network [39]. Another
Table 1. Comparable Analysis
Ref. Task Feature
Selection
Deep Architecture
Acc. XAI
[23] Seizure Detection Raw, Spectral,
temporal, EEG
snapshot, spectrogram
FCNN, RNN, DNN
99.7% No
[28]
[29]
Seizure Detection
Seizure Detection
Raw Data
Raw Data
CNN
CNN, Attention,
FCNN
98.05%
-
Yes
Yes
[31] Seizure
Prediction
Raw data AE+ BiLstm 99.66% No
[32] Seizure Detection Directed Connectivity
and Graph Metrics
DNN 99.43% No
[37] Video Detection Trajectory Features DNN 93.33% Yes
[39] Schizophrenia
Detection
Mixed Connectivity CNN 91.69% No
[41] Seizure Detection Raw Data CNN, BiLstm 98.89% No
Proposed
Method
Seizure
Detection
7 Connectivity
Measures
CNN, BiLstm,
Attention, FCNN
97.03% Yes
approach for epilepsy detection uses the raw EEG data as input to a bidirectional recurrent
network that uses the past window knowledge to predict the next window’s label. The method
can discriminate normal-ictal and normal-ictal-interictal EEG signals accurately [41].
New techniques are currently developing and shaping into a field of study called explainable AI
(XAI) [24] where researchers try to make use of the learned blocks of information inside the deep
learning models. A modified deep learning model will also learn explainable features while
training. The first efforts employed the deep CNN using deconvolution methods to explain the
feature maps. Parts of the image that activated certain neurons were marked during the process
[33]. Another XAI algorithm, LIME (Local Interpretable Model-Agnostic Explanations), finds
for every test sample the relevance of a particular learned feature for a specific output using a
local approximation with a linear sparse model [34]. Class activation mapping (CAM) is another
method for saliency map generation primarily used for object localization in the image. CNN
with a final Global average pooling (GAP) layer was constructed and the weights of the GAP
layer were employed to compute heat maps showing the localization of objects in an image [35].
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
5
Better results were obtained with Grad-CAM++ which is a variation of CAM that uses the
gradient of the output class with respect to the activation of the feature maps to find better
saliency maps[36]. Roy et al. proposed a task aware selection of features by learning a deep
neural network (DNN) for action recognition using video as input. They extract a 426 trajectory
and motion features, and after learning the DNN, they study the activation potential normalized
over all layers to quantify feature relevance. The authors employ the first layer activation
functions to define a contribution measure for each feature [37]. Feature relevance was also
studied during Parkinson disease using data mining techniques [38].
Adversarial representation learning methods were employed for robust general seizure detection
models. In one study, a deep CNN model that employs 2-second window raw data as input was
analysed where the authors employ the weights of the learned model to visualize internal
functions of the network and extract feature maps. Using the maps as receptive fields in the
intermediate layers, they investigate domain-specific knowledge and class-discriminative features
using correlation maps in different frequency bands which were further processed to construct
scalp topographies [29]. None of these methods can provide deeper insights about the data. Our
proposed method is designed to keep track of the used features and provide explainability of the
measures used. The features that trigger the decision of the model will be revealed and this will
provide valuable information that can be related to the type of disease or type of interactions
during seizure and non-seizure episodes. We will also show that different seizure patients exhibit
diverse feature relevance maps which can be used for further analysis.
3. PROPOSED METHODOLOGY
3.1. Overview of the method
As depicted in figure 1, epileptic EEG data is segmented into short-duration fixed length
windows from which we extract the five common brain rhythms δ (2−4 Hz), θ (4−8 Hz), α (8−13
Hz), β (13−30 Hz), γ (greater than 30Hz) using Butterworth bandpass filters. For each of these
rhythms, we compute seven connectivity measures: Spectral matrix, inverse of spectral matrix,
coherence, partial coherence, directed coherence, partial directed coherence, and phase-locking
value.
Figure 1. The overall Workflow
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
6
These connectivity measures are arranged in a tensor and fed as features to a deep seizure
detection network. We evaluate and compare four different deep learning models by
manipulating the fusion mechanism. Since our features represent different perspective
connectivity measures, we intend to benefit from the rich information found in these features.
Similar to [37], we use the activation values of the learned model to output the type of
connectivity that quantifies participation percentage in making the model make a certain decision.
3.2. Data Preparation and Processing
The raw data were first pre-processed to extract 20-seconds long windows of seizure and non-
seizure data which were manually extracted using the labels provided with the data. Each of these
signals is divided further into ten 2-seconds long sub-windows resulting in a sequence of 10
seizure sub-windows that are fed to bandpass filters to extract five rhythms for each.
3.3. Feature Extraction
Taking our channels as a multivariate process , the multivariate linear shift-invariant filter
representation can be expressed by:
(1)
where is a vector of zero mean inputs and is matrix representing a filter
impulse response.
On the other hand, the multivariate autoregressive (MVAR) model of order can be expressed
as:
(2)
where can be considered as uncorrelated zero mean Gaussian noise. This model can allow
defining interactions between different signals such as coupling and causality using the matrices
since the term quantifies the causal linear interactions between and at lag k.
In the frequency domain, using the Fourier transform, the above equations yield
and . By comparing the two spectral
representations above, one can derive the following relation: .
The cross-spectral density matrix S(f) and its inverse P(f) are defined by
and where the superscript represents the
Hermitian transpose and represents the covariance of The coherence between the two
signals and at a frequency f is can now be derived as:
(3)
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
7
while the directed coherence can be expressed by:
(4)
where represents the variance of signal .
The partial coherence can also be derived in a similar fashion and can be represented by:
(5)
from which we can infer the partial directed coherence, PDC, defined by:
(6)
On the other hand, phase locking value is a measure that quantifies the synchrony between two
signals. The signals and are first band pass filtered in the specified frequency bands, and
then Hilbert transform is applied to extract the corresponding phases and . The phase
locking value (PLV) can be expressed as:
(7)
where N is the number of samples considered per window.
3.4. Deep Learning Model Architectures
The data formed is in the form of a tensor of dimension 7x10x19x19x5. Each data sample takes
in 10-time windows of the 7 connectivity features of 2-second intervals, each window is
represented by a 19x19x5 matrix where the third dimension represents the frequency band and
19x19 is the connectivity matrix. Each row and column represent an EEG channel so one entry in
this matrix represents the connectivity of the channel along the row with the channel along the
column in one of the 5 different frequency bands. In our four architectures, all convolution
operations use one filter because we are trying to capture a numerical combination of the input
matrix and not a characteristic (i.e. shade or edges) since it’s not an image. Furthermore, they
have a similar base architecture, but they use different schemes to combine the different features.
For our first architecture shown in figure 2, we separate the 7 features and the five frequency
bands of each feature, resulting into 35 independent inputs, and feed them into separate, identical
blocks as depicted in figure 2A. We process each input as a time series so each window passes
through one 2D convolution layer of kernel size of (19,1) . This operation helps us to condense
all the relations that a channel has with the other into one number. Since we obtained one feature
vector per window, we feed them to an LSTM block followed by a self-attention layer and fully
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
8
connected layers. Each one of the 35 inputs pass through this block and at the end, the vectors
obtained from the last FCNN layers are concatenated and fed to an FCNN layer for classification
as shown in figure 2B. This fusion scheme assumes total independence of the features and
frequency bands, the feature vectors are concatenated before the last FCNN layer. The second
fusion scheme, instead of combining the feature vectors in the last layer, we combine the vectors
obtained after the attention layer of each feature and feed them to FCNN layers as shown in
figure 3. For the third fusion scheme, after having a feature vector of size 19 for each window for
each feature we concatenate the feature vectors of similar time steps together, obtaining a 19x7
matrix then we proceed with a 2D convolution layer of kernel size (1,19) to get one feature vector
of size 19. We obtain channel-wise combination of the different features which we feed into an
LSTM block followed by an attention layer and FCNN layers (figure 4). Figure 5 depicts the
fourth fusion scheme, after having obtained a 19x19 matrix for each window of each feature we
concatenate the feature matrices of similar time steps together, obtaining a 19x19x7 matrix for
each time step then we proceed with a 3D convolution layer of kernel size (1,1,7) to get a 19x19
matrix in order to get a frequency wise combination of the different features and then proceed
with a 2D convolution layer of kernel size (1,19) followed
Figure 2. Model 1 Architecture
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
9
Figure 3. Model 2 Architecture
Figure 4. Model 3 architecture
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
10
Figure 5. Model 4 architecture
by an LSTM block, an Attention layer, and FCNN layers. Thus, our 4 fusion schemes aim at
reducing the level at which relationships between the features are the strongest. The first scheme
assumes no relation between features and frequencies, the second assumes a high-level relation,
the third assumes a channel-wise relation and the fourth one assumes a frequency wise relation.
3.5. Feature Relevance
For the XAI part, we planned to address one of the fundamental neuroscientific questions
concerned in finding relevant temporal and spatial scales necessary for given behaviour [28] by
doing a statistical study for the relationships between the derived connectivity measures varying
between spectral, causal, and phase-related, that characterize the brain signals on one hand and
the seizure-nonseizure for patient-specific and cross-patient cases on the other hand. We aim at
finding some link between the explained features contribution results to some of the scientific
facts concerning the seizure detection and neurology fields.
To apply XAI, we first targeted our research on the pre-processing and extracting different
connectivity measures (SM, ISM, DC, C, PDC, PC, PLV) across different frequency bands
) that we believe, based on some prior knowledge and experimental studies, have a
direct impact on seizure detection [13][25][26][27]. Then, we tried to achieve our explainability
in the post-modelling stage using the input-based explanation drivers methods where we base our
feature study on the output predictions [42]. As we see in the structure of model 2, we tried
separating each feature’s CNN-LSTM unit alone and concatenate them at a later stage, in such a
way we can simply do our study using the concatenation and the first dense layer only. This
allows us to keep track of the features.
The concatenation layer will combine the outputs from the 7 features separated CNN-LSTM units
into one flatten layer as depicted in figure 6 as vector . Our further investigation was based on
the feature extraction paper [37] at the first dense layer. To get the relevance of each of the input
neurons of the flatten layer, we first calculated the average absolute activation potential
contributed by the dimension of the input:
(8)
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
11
where the activation is: .
Then, we find the relative contribution of the input dimension towards the activation of
the hidden neuron:
(9)
In order to get the total net contribution is of an input dimension overall hidden layers, we then
computed the net of all ’s for every input overall :
(10)
Since, our input is a set of neurons per feature in order of the sequence SM, ISM, DC, C, GDPC,
PC, PLV respectively, then we further summed each of that sets alone to get the net per
feature. According to the feature extraction paper, the higher the contribution of an input
dimension, the more likely it is its participation in hidden neuronal activity and consequently,
classification [37].
Figure 6. Last layer architecture weights and activations from which relevance of features are computed
To begin our study, we first input the whole pre-processed dataset, cross-patient study, for seizure
and non-seizure cases separately in the first part of the model, the layers before the concatenation
part, and extracted the embeddings or the 400 neuron per feature and then got our results
according to the technique described above. The analysis is further extended to find inter-patient
feature relevance variations.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
12
4. EXPERIMENTAL RESULTS
4.1. Experimental Setup
CHB-MIT is an EEG dataset collected at Children’s Hospital in Boston. It has 24 cases of
epilepsy. EEG was acquired using the international 10-20 system sampled at 256 samples per
second with a 16-bit resolution. Overall 198 seizures are annotated with the beginning and the
end of seizures. This particular dataset was chosen because it was used in many state of the art
papers which allows us to compare our deep learning model’s performance. As shown in figure 7,
we extract from this dataset 20 second intervals from seizure and non-seizure episodes. The
seizure of length less than 20 seconds, which were very few, were ignored in this study and the
seizure episodes of duration greater than 20 seconds were dissected into 20 second intervals. The
remainders were also considered by taking 20 seconds from the end. As for the non-seizure
episodes, four 20-second intervals were taken randomly from every dataset. The total number of
20 episode intervals collected contained 543 seizure intervals and 801 non-seizure intervals.
4.2. Validation Metrics
To evaluate the performance of the models, we use the statistical measures of binary
classification. We denote the seizure label as the positive class and the non-seizure case as the
negative class. We first define the following terms by
 True Positive (TP) : number of hits, correctly classified positives
 False Positive (FP): number false alarms, classified as seizure while it actually has no
seizure.
 False Negative (FN): number of misses, classified as non-seizure while it actually is
seizure.
 True Negative (TN): number of correct rejections, correctly classified negatives.
Sensitivity measures the percentage of positive class members that are correctly identified and is
given by:
(11)
Specificity gives the percentage of negative class members that are correctly identified whose
formula is:
(12)
Precision finds the positive predictive rate which explains how much of the positives were
identified.
(13)
Finally, the accuracy is computed using:
(14)
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
13
4.3. Feature Extraction
Each of these 20-second intervals was first filtered in the five frequency bands described above
and further dissected into ten 2-second intervals each of which is processed to extract the seven
described features. The ten sub-intervals are then gathered into a single tensor resulting in a
tensor of size 7x10x19x19x5 as shown in figure 7.
Figure 7. Data preparation and feature extraction
4.4. Performance Results of the proposed models
The models were evaluated on a data split of 85% training and 15% testing. After fine tuning, the
first model performed the best with a 97.03% accuracy as you can see in Table 2. The results also
show that the earlier the fusion scheme the weaker the relation between the features. Since our
data dissection method is totally new, no quantitative comparative analysis was performed. We
train all our models twice, the first time non-seizure data has a label of 1 and seizure data has a
label of 0 and the second time, using the resulting weights from the first time, the labels are
flipped, that procedure forces the model to learn more robust features for each class. The data
was too big to load into memory, so we had to write our own data generator that fetches batches
of 32 samples from the folder. The hyper parameters for each model are as follows:
 Model1: a relu activation function for both dense and convolution layers, 2 LSTM layers
and 1 dense layers of 100 neurons for each block and a dropout rate of 0.5. The optimizer
used is Adam and 30 epochs for both training phases.
 Model2: a relu activation function for both dense and convolution layers with a Spatial
Dropout2D of 0.07, we use 2 LSTM layers and all dense layers have Dropout of 0.5. We
use 2 dense layers of 263 and 20 neurons respectively. The optimizer used for the first
training phase was RMSprop for 17 epochs, after the flip Nadam was used as optimizer for
8 epochs.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
14
 Model3: a relu activation function for both dense and convolution layers, 2 LSTM layers
and 3 dense layers of 500 neurons for each block and a dropout rate of 0.5 and l2
regularization. The optimizer used is Adam and 80 epochs for both training phases.
 Model4: a relu activation function for both dense and convolution layers, 2 LSTM layers
and 3 dense layers of 200 neurons for each block and a dropout rate of 0.3 and l2
regularization. The optimizer used is Adam and 100 epochs for both training phases.
The performance of the different modes on training and testing data are tabulated in table II. The
results are the average of ten runs with different splits each time. The overall accuracy,
sensitivity, specificity, and the precision are shown. While the model is well fit to the training
data in all models, unseen testing data performance is considered to choose the best model.
Model 1 performs the best in all measures recording a sensitivity of 97.65%, a specificity of
96.58%, precision 95.4%, and an overall accuracy of 97.03%.
Table 2. Performance Metrics Across all proposed models
Data Sensitivity Specificity Precision Accuracy
Model 1 Training 100.00 99.85 99.78 99.91
Model 1 Testing 97.65 96.58 95.40 97.03
Model 2 Training 99.13 98.10 97.22 98.51
Model 2 Testing 94.67 96.06 93.42 95.54
Model 3 Training 99.79 100.00 100.00 99.91
Model 3 Testing 80.23 92.24 88.46 87.13
Model 4 Training 96.08 96.49 94.84 96.32
Model 4 Testing 77.22 86.18 78.21 82.67
4.5. Feature Relevance Results
Figure 8 represents the overall feature relevance of all the data. We can see that spectral matrix
and partial coherence have higher relevance during seizures on average while non-seizure
decisions are made more effective with the coherence and the partial directed coherence.
Figure 8. Average feature relevance across all data
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
15
Cross-patient test results show the variation of the feature relevance diagrams for each of the tests
as can be seen in figures 9 and 10. We notice that every patient had different feature relevance
plot than the other, such that there is a different way in assigning the relevance and weights of
each feature which can be interpreted as a validation of the scientific fact that EEG patterns in
seizure patients are highly variable across patients. On the other hand, there were some changes
across the same patient results which can be explained on basing the features on both time and
frequency domains and EEG seizure pattern data are highly dynamic in nature for the same
patient.
Fig. 9. Feature relevance for patient 15
Figure. 10 Feature relevance for patient 20
Another interesting finding is that the directed coherence feature is often assigned with the least
net contribution The phase-locking value is often assigned with good relevance value in
seizure and non-seizure compared to other features which might be an indication for the fact that
PLV is one of the undirected connectivity measures that can infer the disconnection of the parts
of the brain where the seizure onset zones get disconnected [26].
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
16
5. DISCUSSION AND FUTURE WORK
EEG data analysis has been studied using EEG data and many of the techniques rely on the data
dissection and the features used. In this study, we adopted a sequential portion of EEG data
which contains information about a sequence of ten consecutive EEG sub-windows. This choice
was made because during seizures, the signals may alternate from seizure to non-seizure states.
Also, during non-seizure data, short bursts of seizure activity may arise and can be labelled as
seizure if taken separately. The BiLstm structure between these ten windows can learn
relationships between the windows during a seizure and avoid such misclassifications. Besides,
seizure data is generally noisy and stochastic in nature, and having relating higher-level
information between consecutive windows can help learn more complex relationships across
time.
Our study focuses on discriminating seizure/non-seizure while providing explanations of our
learned model which make it more presented different CNN-Lstm models for detecting seizure
based on long windows. Our models used different fusion strategies where the first two models
combine the features at the decision level, and the two combined them at the input level. We were
able to show that combining features at the decision layers yield in a much better performance
which can be an indication that the features are better learned when separated in the feature
extraction part. In our study, we made use of the latest techniques and advancement mainly in the
regularization and the normalization methods that helped our architecture in achieving better
results. We employed various fusion mechanisms and conclude that fusion at the input is not
performing well compared to fusion the features at the end which makes CNN able to learn and
extract its high order representation better.
Feature extraction for all the data windows considered in this study was computationally very
expensive and took few days. This can be accelerated using GPU programming and distributed
and parallel computing methodologies. Many extensions are possible at the level of architecture,
feature selection, and explainablity. We can investigate different architectures to capture other
relations, since the assumption that we started from in this study is that the smallest input possible
is a window of the same feature in different frequencies, we would have to investigate if the
smallest possible input window of the same frequency in different features yields better accuracy
showing that the seizure is more related at the frequency level rather than the feature level and we
will be able to do XAI to deduce which frequency band is the most important. Our methods need
to encompass other EEG datasets to find more general models and analyse other seizure patient
biomarkers. Finally, our methods can be extended to learn sequence relationships at transition
episodes where states shift from pre-ictal to ictal as well as ictal to post-ictal transitions to
characterize the state transitions during seizures.
Finally, we explained the relevance of each feature to characterize the dynamics during seizures.
In the future, these values can be extended to capture the relevant features in specific frequency
bands thus providing spectral information along with the type of connectivity between channels
and hence more explanation about the seizures.
6. CONCLUSION
Seizure detection using a sequence model was proposed in this study. We have shown that
relating higher resolution data together as a sequence can characterize differences between
seizure and non-seizure data. Among the four studied models based on deep learning, model 1
which used fusion at the level of the decision recorded 97.03% accuracy, 97.65% sensitivity,
96.58% specificity and a 95.40% precision. The models were based on different neural network
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
17
architectures mainly on CNN and LSTM with attention layers. The learned weights of the model
helped understand the relevance of the chosen features and further showed that they can represent
cross-patient discriminative features and open way to many future studies for seizure analysis.
REFERENCES
[1] World Health Organization. (2006). Neurological disorders: public health challenges. World Health
Organization.
[2] Truccolo, W., Ahmed, O. J., Harrison, M. T., Eskandar, E. N., Cosgrove, G. R., Madsen, J. R., ... &
Cash, S. S. (2014). Neuronal ensemble synchrony during human focal seizures. Journal of
Neuroscience, 34(30), 9927-9944.
[3] Alotaiby, T. N., Alshebeili, S. A., Alshawi, T., Ahmad, I., & El-Samie, F. E. A. (2014). EEG seizure
detection and prediction algorithms: a survey. EURASIP Journal on Advances in Signal
Processing, 2014(1), 183.
[4] Zaylaa, A. J., Harb, A., Khatib, F. I., Nahas, Z., & Karameh, F. N. (2015, September). Entropy
complexity analysis of electroencephalographic signals during pre-ictal, seizure and post-ictal brain
events. In 2015 International Conference on Advances in Biomedical Engineering (ICABME) (pp.
134-137). IEEE.
[5] Carney, P. R., Myers, S., & Geyer, J. D. (2011). Seizure prediction: methods. Epilepsy &
behavior, 22, S94-S101.
[6] Friston, K. J. (2011). Functional and effective connectivity: a review. Brain connectivity, 1(1), 13-36.
[7] Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005). Nonlinear multivariate analysis of
neurophysiological signals. Progress in neurobiology, 77(1-2), 1-37.
[8] Bastos, A. M., & Schoffelen, J. M. (2016). A tutorial review of functional connectivity analysis
methods and their interpretational pitfalls. Frontiers in systems neuroscience, 9, 175.
[9] Kovach, C. K. (2017). A biased look at phase locking: Brief critical review and proposed
remedy. IEEE Transactions on signal processing, 65(17), 4468-4480.
[10] Van Mierlo, P., Papadopoulou, M., Carrette, E., Boon, P., Vandenberghe, S., Vonck, K., &
Marinazzo, D. (2014). Functional brain connectivity from EEG in epilepsy: Seizure prediction and
epileptogenic focus localization. Progress in neurobiology, 121, 19-35.
[11] Thorniley, J. (2011). An improved transfer entropy method for establishing causal effects in
synchronizing oscillators. In ECAL (pp. 797-804).
[12] Battaglia, D., Witt, A., Wolf, F., & Geisel, T. (2012). Dynamic effective connectivity of inter-areal
brain circuits. PLoS computational biology, 8(3).
[13] Brázdil, M., Halámek, J., Jurák, P., Daniel, P., Kuba, R., Chrastina, J. & Rektor, I. (2010). Interictal
high-frequency oscillations indicate seizure onset zone in patients with focal cortical
dysplasia. Epilepsy research, 90(1-2), 28-32.
[14] B.J. Edelman, N. Johnson, A. Sohrabpour, S. Tong, N. Thakor, B. He, Systems neuroengineering:
understanding and interacting with the brain, Engineering. 1 (2015) 292–308. doi:10.15302/j-
eng2015078.
[15] P. Van Mierlo, E. Carrette, H. Hallez, R. Raedt, A. Meurs, S. Vandenberghe, D. Van Roost, P. Boon,
S. Staelens, K. Vonck, Ictal-onset localization through connectivity analysis of intracranial EEG
signals in patients with refractory epilepsy, Epilepsia. 54 (2013) 1409–1418. doi:10.1111/epi.12206
[16] Bandarabadi, M., Teixeira, C. A., Rasekhi, J., & Dourado, A. (2015). Epileptic seizure prediction
using relative spectral power features. Clinical Neurophysiology, 126(2), 237-248.
[17] Usman, S. M., Usman, M., & Fong, S. (2017). Epileptic seizures prediction using machine learning
methods. Computational and mathematical methods in medicine, 2017.
[18] Wang, H. E., Bénar, C. G., Quilichini, P. P., Friston, K. J., Jirsa, V. K., & Bernard, C. (2014). A
systematic framework for functional connectivity measures. Frontiers in neuroscience, 8, 405.
[19] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of
parkinson disease tele-monitoring data through data mining techniques. International Journal of
Advanced Research in Computer Science and Software Engineering, 2(3).
[20] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[21] Tsiouris, Κ. Μ., Pezoulas, V. C., Zervakis, M., Konitsiotis, S., Koutsouris, D. D., & Fotiadis, D. I.
(2018). A Long Short-Term Memory deep learning network for the prediction of epileptic seizures
using EEG signals. Computers in biology and medicine, 99, 24-37.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
18
[22] Cho, K. O., & Jang, H. J. (2020). Comparison of different input modalities and network structures for
deep learning-based seizure detection. Scientific Reports, 10(1), 1-11.
[23] Boonyakitanont, P., Lek-uthai, A., Chomtho, K., & Songsiri, J. (2019). A Comparison of Deep
Neural Networks for Seizure Detection in EEG Signals. bioRxiv, 702654.
[24] Gunning, D. (2017). Explainable artificial intelligence (xai). Defense Advanced Research Projects
Agency (DARPA), nd Web, 2.
[25] Weiss, S. A., Lemesiou, A., Connors, R., Banks, G. P., McKhann, G. M., Goodman, R. R., & Diehl,
B. (2015). Seizure localization using ictal phase-locked high gamma: a retrospective surgical
outcome study. Neurology, 84(23), 2320-2328.
[26] Myers, M. H., Padmanabha, A., Hossain, G., de Jongh Curry, A. L., & Blaha, C. D. (2016). Seizure
prediction and detection via phase and amplitude lock values. Frontiers in human neuroscience, 10,
80.
[27] Douw, L., van Dellen, E., de Groot, M., Heimans, J. J., Klein, M., Stam, C. J., & Reijneveld, J. C.
(2010). Epilepsy is related to theta band brain connectivity and network topology in brain tumor
patients. BMC neuroscience, 11(1), 103.
[28] Hossain, M. S., Amin, S. U., Alsulaiman, M., & Muhammad, G. (2019). Applying deep learning for
epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), 15(1s), 1-17.
[29] Zhang, X., Yao, L., Dong, M., Liu, Z., Zhang, Y., & Li, Y. (2019). Adversarial Representation
Learning for Robust Patient-Independent Epileptic Seizure Detection. arXiv preprint
arXiv:1909.10868
[30] Li, Q., Chen, Y., Wei, Y., Chen, S., Ma, L., He, Z., & Chen, Z. (2017). Functional Network
Connectivity Patterns between Idiopathic Generalized Epilepsy with Myoclonic and Absence
Seizures. Frontiers in computational neuroscience, 11, 38.
[31] Daoud, H., & Bayoumi, M. A. (2019). Efficient epileptic seizure prediction based on deep
learning. IEEE transactions on biomedical circuits and systems, 13(5), 804-813.
[32] Akbarian, B., & Erfanian, A. (2019). A framework for seizure detection using effective connectivity,
graph theory and deep modular neural networks. arXiv preprint arXiv:1909.03091
[33] Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
[34] Meka, A., Maximov, M., Zollhoefer, M., Chatterjee, A., Seidel, H. P., Richardt, C., & Theobalt, C.
(2018). Lime: Live intrinsic material estimation. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 6315-6324).
[35] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for
discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 2921-2929).
[36] Chattopadhyay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2017). Grad-CAM++:
Improved Visual Explanations for Deep Convolutional Networks. arXiv preprint arXiv:1710.11063.
[37] Roy, D., Murty, K. S. R., & Mohan, C. K. (2015, July). Feature selection using deep neural networks.
In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE.
[38] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of
parkinson disease tele-monitoring data through data mining techniques. International Journal of
Advanced Research in Computer Science and Software Engineering, 2(3).
[39] Phang, Chun-Ren & Numan, Fuad & Hussain, Hadri & Ting, Chee-Ming & Ombao, Hernando.
(2019). A Multi-Domain Connectome Convolutional Neural Network for Identifying Schizophrenia
from EEG Connectivity Patterns. IEEE Journal of Biomedical and Health Informatics. PP. 1-1.
10.1109/JBHI.2019.2941222.
[40] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). “Why should I trust you?" Explaining the
predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining (pp. 1135-1144).
[41] Abdelhameed, A. M., Daoud, H. G., & Bayoumi, M. (2018, June). Deep convolutional bidirectional
LSTM recurrent neural network for epileptic seizure detection. In 2018 16th IEEE International New
Circuits and Systems Conference (NEWCAS) (pp. 139-143). IEEE.
[42] Fellous, J. M., Sapiro, G., Rossi, A., Mayberg, H. S., & Ferrante, M. (2019). Explainable Artificial
Intelligence for Neuroscience: Behavioral Neurostimulation. Frontiers in Neuroscience, 13, 1346.
International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
19
AUTHORS
Hmayag Partamian is a Ph.D. candidate in the Electrical and Computer Engineering
(ECE) department at the American University of Beirut (AUB). He graduated with a
B.E. in electrical and computer engineering from the Lebanese University in 2005 and
masters and a M.S. in computational science from AUB in 2014. His research
encompasses diverse signal analysis and machine learning algorithms such as seismic
data analysis, biomedical signal analysis (EEG and ECG), image processing, and
vibration analysis. His current research encompasses modeling of the brain during
seizures using EEG data and developing detection and prediction models using decomposition techniques
and machine learning algorithms.
Fouad Khnaisser a recent computer and communication engineering graduate from
the American University of Beirut. He first started as a research assistant in the AUB
Mind Lab in his third year where he focused on analysing and classifying speech for
different purposes like emotional speech classification and reproduction.
Mohamad Mansour is a recent computer and communication engineering graduate
from the American University of Beirut. He has been part of the Socially Competent
Robotic and Agent Technologies - research group at CYENS Centre of Excellence in
Cyprus in the Explainable AI field. His research interests are natural language
processing, computer vision, reinforcement learning, and robotics.
Reem A. Mahmoud is a Ph.D. candidate in the Electrical and Computer Engineering
(ECE) department at the American University of Beirut (AUB). She graduated with a
B.S. in Electrical Engineering with high distinction from Alfaisal University in Riyadh,
Saudi Arabia, and an M.E. from AUB in 2015 and 2017, respectively. Her main area of
research is theoretical machine learning with a focus on learning from limited time-
series data. Her interests also extend to knowledge transferability and personalization
in machine learning.
Hazem Hajj is an Associate Professor with the American University of Beirut (AUB).
He is a senior member of IEEE and ACM. Over the years, Hazem has established
leadership in the field of artificial intelligence (AI) building on a strong mix of
industry and academics experiences at Intel Corporation and AUB. He received his
PhD from the University of Wisconsin-Madison in 1996, and his Bachelor from AUB
with distinction. Over the years, Hazem has been the recipient of numerous academic
and industry awards. His research interests include Artificial Intelligence (AI),
Machine Learning and Energy-Aware Computing, with special interests in Natural Language Processing
and Context Aware Sensing. His research has produced over 100 publications in the AI field in addition to
multiple patents and awards. His research has been funded by local and international funding sources,
including funding from Intel Corporation and Qatar National Research Fund (QNRF).
Fadi N. Karameh is an Associate Professor in the Electrical and Computer
Engineering Department at the American University of Beirut (AUB) in Beirut,
Lebanon. Prof Karameh joined AUB in 2003 shortly after graduating from the
Laboratory of Information and Decision Systems at the Massachusetts Institute of
Technology (MIT) in Cambridge, USA. His research includes system-theoretic
approaches in identification, estimation and signal processing in electrical engineering,
with an emphasis on neurophysiological signals and systems. His interdisciplinary interests include
developing identification and estimation tools for understanding nonlinear dynamic large-scale interactions
in brain cortical networks from multichannel electrical activity recordings.

More Related Content

PDF
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
PDF
Comparison of fuzzy neural clustering based outlier detection techniques
PDF
Af4102237242
PDF
Ea4301770773
PDF
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
PDF
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...
PDF
G124549
PDF
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
Comparison of fuzzy neural clustering based outlier detection techniques
Af4102237242
Ea4301770773
ROLE OF CERTAINTY FACTOR IN GENERATING ROUGH-FUZZY RULE
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...
G124549

What's hot (20)

PDF
Honors Thesis
PDF
Review on classification based on artificial
PDF
Segmentation
PDF
Successive iteration method for reconstruction of missing data
PDF
icmi2233-bixler
PDF
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
PDF
An Artificial Neural Network Model for Neonatal Disease Diagnosis
PDF
Performance Evaluation of Neural Classifiers Through Confusion Matrices To Di...
PDF
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
PDF
Mammogram image segmentation using rough clustering
PDF
Mammogram image segmentation using rough
PDF
Ijnsa050202
PDF
Telecardiology and Teletreatment System Design for Heart Failures Using Type-...
PDF
A review deep learning for medical image segmentation using multi modality fu...
PDF
Ijeee 16-19-a novel approach to brain tumor classification using wavelet and ...
PDF
COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND FUZZY LOGIC APPROACHES FOR CRACK...
PDF
Performance analysis of neural network models for oxazolines and oxazoles der...
PDF
A novel based approach to investigate distinctive region of brain connectivit...
PDF
Pattern recognition system based on support vector machines
PDF
Genome structure prediction a review over soft computing techniques
Honors Thesis
Review on classification based on artificial
Segmentation
Successive iteration method for reconstruction of missing data
icmi2233-bixler
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
An Artificial Neural Network Model for Neonatal Disease Diagnosis
Performance Evaluation of Neural Classifiers Through Confusion Matrices To Di...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Mammogram image segmentation using rough clustering
Mammogram image segmentation using rough
Ijnsa050202
Telecardiology and Teletreatment System Design for Heart Failures Using Type-...
A review deep learning for medical image segmentation using multi modality fu...
Ijeee 16-19-a novel approach to brain tumor classification using wavelet and ...
COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND FUZZY LOGIC APPROACHES FOR CRACK...
Performance analysis of neural network models for oxazolines and oxazoles der...
A novel based approach to investigate distinctive region of brain connectivit...
Pattern recognition system based on support vector machines
Genome structure prediction a review over soft computing techniques
Ad

Similar to 8421ijbes01 (20)

PDF
0 eeg based 3 d visual fatigue evaluation using cnn
PDF
A Novel Approach For Detection of Neurological Disorders through Electrical P...
PDF
Health electroencephalogram epileptic classification based on Hilbert probabi...
PDF
Investigating the interaction between EEG and fNIRS: A multimodal network ana...
PDF
Transfer learning for epilepsy detection using spectrogram images
PDF
Epilepsy detection using wavelet transform, genetic algorithm, and decision t...
PDF
AN EFFICIENT WAVELET BASED FEATURE REDUCTION AND CLASSIFICATION TECHNIQUE FOR...
PDF
AN EFFICIENT WAVELET BASED FEATURE REDUCTION AND CLASSIFICATION TECHNIQUE FOR...
PDF
Study and analysis of motion artifacts for ambulatory electroencephalography
PDF
A Novel Approach to Study the Effects of Anesthesia on Respiratory Signals by...
PDF
Ec4030
PDF
Survey analysis for optimization algorithms applied to electroencephalogram
PDF
Wavelet-based EEG processing for computer-aided seizure detection and epileps...
PDF
Effective electroencephalogram based epileptic seizure detection using suppo...
PDF
Development of an Automated Diagnostic System using Genetic Algorithms in Ele...
PDF
DEVELOPMENT OF AN AUTOMATED DIAGNOSTIC SYSTEM USING GENETIC ALGORITHMS IN ELE...
DOC
Chapter 5 applications of neural networks
PDF
neural-control-drone
PPTX
RVS CONFERENCE.pptx
PDF
Convolutional Networks
0 eeg based 3 d visual fatigue evaluation using cnn
A Novel Approach For Detection of Neurological Disorders through Electrical P...
Health electroencephalogram epileptic classification based on Hilbert probabi...
Investigating the interaction between EEG and fNIRS: A multimodal network ana...
Transfer learning for epilepsy detection using spectrogram images
Epilepsy detection using wavelet transform, genetic algorithm, and decision t...
AN EFFICIENT WAVELET BASED FEATURE REDUCTION AND CLASSIFICATION TECHNIQUE FOR...
AN EFFICIENT WAVELET BASED FEATURE REDUCTION AND CLASSIFICATION TECHNIQUE FOR...
Study and analysis of motion artifacts for ambulatory electroencephalography
A Novel Approach to Study the Effects of Anesthesia on Respiratory Signals by...
Ec4030
Survey analysis for optimization algorithms applied to electroencephalogram
Wavelet-based EEG processing for computer-aided seizure detection and epileps...
Effective electroencephalogram based epileptic seizure detection using suppo...
Development of an Automated Diagnostic System using Genetic Algorithms in Ele...
DEVELOPMENT OF AN AUTOMATED DIAGNOSTIC SYSTEM USING GENETIC ALGORITHMS IN ELE...
Chapter 5 applications of neural networks
neural-control-drone
RVS CONFERENCE.pptx
Convolutional Networks
Ad

Recently uploaded (20)

PPTX
Artificial Intelligence
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPT
Occupational Health and Safety Management System
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPT
Total quality management ppt for engineering students
PPTX
introduction to high performance computing
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
Current and future trends in Computer Vision.pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
introduction to datamining and warehousing
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
communication and presentation skills 01
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
Artificial Intelligence
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Occupational Health and Safety Management System
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
R24 SURVEYING LAB MANUAL for civil enggi
Total quality management ppt for engineering students
introduction to high performance computing
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Fundamentals of safety and accident prevention -final (1).pptx
Current and future trends in Computer Vision.pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
introduction to datamining and warehousing
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Information Storage and Retrieval Techniques Unit III
communication and presentation skills 01
UNIT 4 Total Quality Management .pptx
Visual Aids for Exploratory Data Analysis.pdf

8421ijbes01

  • 1. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 DOI: 10.5121/ijbes.2021.8401 1 A DEEP MODEL FOR EEG SEIZURE DETECTION WITH EXPLAINABLE AI USING CONNECTIVITY FEATURES Hmayag Partamian, Fouad Khnaisser, Mohamad Mansour, Reem Mahmoud, Hazem Hajj and Fadi Karameh Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon ABSTRACT During seizures, different types of communication between different parts of the brain are characterized by many state of the art connectivity measures. We propose to employ a set of undirected (spectral matrix, the inverse of the spectral matrix, coherence, partial coherence, and phase-locking value) and directed features (directed coherence, the partial directed coherence) to detect seizures using a deep neural network. Taking our data as a sequence of ten sub-windows, an optimal deep sequence learning architecture using attention, CNN, BiLstm, and fully connected neural networks is designed to output the detection label and the relevance of the features. The relevance is computed using the weights of the model in the activation values of the receptive fields at a particular layer. The best model resulted in 97.03% accuracy using balanced MIT-BIH data subset. Finally, an analysis of the relevance of the features is reported. KEYWORDS Seizure detection, deep sequence learning, brain connectivity, explainability, feature relevance. 1. INTRODUCTION Epilepsy is a neurological disorder that affects around 50 million people of all ages worldwide [1]. It is characterized by the frequent and repetitive occurrence of seizures that disrupt normal function and affect the quality of life of the patient. Neuroimaging techniques acquired from epilepsy patients showed evidence of different structural and functional irregularities. Synchronous discharges of electrical activity across different parts of the brain during seizures were captured using electroencephalogram (EEG) data [2]. To extract meaningful information, a plethora of measures have been designed to characterize the changes in the EEG signals during epilepsy that can be classified into two types. Univariate metrics measure the information in a window of one time series and can be classified as temporal, spectral, and entropy-based measures [3] [4] [5]. On the other hand, multivariate connectivity metrics [6] [7] [8], characterize information between multiple time series and can be directed (effective) or undirected (functional). For example, the phase-locking value (PLV) is an undirected measure since it quantifies the synchrony between a pair of signals [9]. Granger Causality (GC) calculates the amount of information transfer from one channel to the other which makes it a directed measure [10] [11] [12]. Cross-frequency coupling (CFC) methods have also been used in characterizing information between different brain regions by studying the interaction between oscillations across different frequency bands such as phase-amplitude coupling (PAM) [13].
  • 2. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 2 Ictal episodes are those that exhibit seizure activity and will have different connectivity values when compared to non-ictal periods [14]. Also, seizure onset zones usually get isolated in their activity before the seizure starts which can be captured using a coherence connectivity matrix [15]. Electrophysiological research also reports that functional connectivity analysis allows localization of seizure onset zones (SOZ), whose exact location helps increase surgery success rates [3]. Besides, variations in phase-locking value (PLV) can characterize the synchronous activity between different parts of the brain during a seizure and non-seizure episodes [9]. In the last decade, advances in technology have made machine learning and big data analysis available and many seizure detection algorithms were developed using diverse machine learning algorithms. Support vector machines (SVM) is one of the common techniques used to learn classification algorithms for seizure detection [16] [17] [3]. Usually, many classical tailored features are fed to the SVM classifiers. Therefore, employing unnecessary features may hinder both the performance and the speed of analysis [18]. Feature selection techniques were proposed to reduce the feature set since machine learning algorithms often perform better when relevant features are selected and used for learning [19]. An alternative branch of machine learning is deep learning (DL) which studies multiple-layer architectures of neural networks that can learn discriminating features and a classifier simultaneously [20]. The raw EEG data, snapshot images, different univariate and bivariate measures were used as input to deep classifiers to detect seizures and non-seizure episodes during epilepsy [21] [22] [23]. However, deep networks are perceived as “black box” techniques since the role of the different layers and the overall internal functioning are unknown. A major issue arises because scientists need to understand why such a decision was made. Can such models be trusted without knowing why they fail and why they succeed? Explainable artificial intelligence (XAI) [24] techniques try to derive explanations from the parameters of the deep network to infer knowledge and build explainable features [13]. In epilepsy analysis, seizure detection is an important problem, however, for doctors to be able to diagnose epilepsy, they need deeper information about the interactions of the brain regions such as the localization of SOZ which can help identify a resection area for surgery. Experimental studies have shown the phase-amplitude coupling [13] and the power in the high-frequency band (>100Hz) [25] increase in the seizure onset zone and during the seizures. The SOZ also gets disconnected from the rest of the brain regions that can be inferred using undirected connectivity measures such as PLV [26]. Another study shows that the phase-lag index (PLI) connectivity in the θ (4-8 Hz) band can be linked to tumor-related epilepsy [27]. Explainability during seizure analysis were also addressed to find the feature relevance at different frequency bands [28] and extract information from the learned weights to derive topographic brain maps [28] [29]. In this study, we selected a subset of the different connectivity measures to characterize the brain signals from different perspectives. spectral matrix (SM), the inverse of the spectral matrix (IS), coherence (COH), partial coherence (PC), and phase-locking value (PLV) as undirected measures, and directed coherence (DC), and the partial directed coherence (PDC) as directed features. We feed the chosen features to learn a deep model that not only classifies seizures but also computes how much each of the features has participated in the decision made by the detector using the deep learning model parameters. The weights of the model provide explainability at the level of the features. This is of key importance for epilepsy analysis since it provides the user with additional valuable information. Also, seizure events differ from one user to the other and the model will be generally trained using different patient data that have different seizure dynamics. The types of epilepsy in the training set can also be different and may exhibit different connectivity values [30]. Therefore, when new data is tested, the proposed model will detect whether a specific interval of EEG is a part of a seizure or not and also outputs the percentage of the impact of each of the employed connectivity measures on the model’s decision.
  • 3. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 3 We summarize our contributions as follows:  Unlike classical methods where they use a single window, our model considers a 20- second window with ten 2 second sub-windows to characterize interdependencies between these windows through time and better describe seizure and non-seizure intervals of the EEG data.  The model is designed to perform seizure detection of the 20-second data using attention, convolutional neural networks (CNN), fully connected layers (FCNN), and bidirectional long short term memory (BiLstm) methodologies which is a type of a recurrent neural network (RNN).  In [37], different fusion methods were employed while building the architecture of the models. We also employed different fusion methods and compared them to understand relationships between features and the output.  Finally, we infer from the weights of the network the relevance of the connectivity measures on the decision made by the detector. We study the impact of the features on the output generally across all patients as well as per patients. To the best of our knowledge, no other work has studied explainability with connectivity analysis during seizure classification. The rest of the paper is organized as follows. In section 2, we present all the related work for this study. In section 3, we explain our methodology providing the different designs and workflows of the proposed method. In section 4, a series of experiments are conducted to evaluate the performance of our design. In section 5, we discuss our findings, its limitations, and propose possible future extensions. We finally conclude in section 6, providing a summary of our work. 2. RELATED WORK Two tasks arise while working with epileptic data, classifying the sample as ictal or non-ictal or classifying the window as preictal (the period that precedes an ictal phase) or inter-ictal (period of normal activity). Even though we will tackle the ictal/non-ictal classification problem, studying the preictal/inter-ictal problem is useful as it gives us intuition and inspiration to develop our architecture. The state of the art for preictal and inter-ictal classification is produced by Daoud et al. [31] where they describe a methodology to train, using the raw multichannel EEG data, a deep convolutional autoencoder (DCAE) and uses the latent space representation i.e. (output of the pre-trained encoder) of each recording as input to a BiLstm network that classifies the example into preictal and inter-ictal. The Dataset used is the CHB-MIT EEG dataset recorded at Children’s Hospital Boston and is publicly available. To narrow down the channels considered (23 total channels), an iterative algorithm was used to select the channels. The algorithm calculates the product between variance and entropy for each channel and iteratively trains the model on bigger windows until all the channels are considered to select the best combination of performance accuracy and computational cost. Through this approach, an accuracy of 99.66% is achieved. As for ictal/non-ictal prediction, Akbarian et al. [32] use effective brain connectivity, the directed transfer function (DTF), the directed coherence (DC), and the generalized partial directed coherence (GPDC) in various frequency bands, to measure the relation between brain regions. They extracted features from each of these measures using graph theory. They fed each set of newly extracted features to an autoencoder (AE) for feature reduction then used a softmax on the output of the encoder, obtaining 3 classifiers. The decision of ictal/non-ictal is then inferred through majority voting. Through this method and using the CHB-MIT dataset, they achieved a 99.43% accuracy.
  • 4. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 4 Table 1 presents an overview of different architectures and their respective task-specific accuracies. We also include the features used by each study. When using different features we need to combine them to feed them to the network, different types of combinations/fusions exist such as concatenating them at the input layer, or feeding them through separate networks and concatenating their respective outputs and use them as input to the final output layer. Fusion methods were applied in EEG with the Schizophrenia detection problem where they showcase the effect of varying fusion mechanisms on the performance of the deep network [39]. Another Table 1. Comparable Analysis Ref. Task Feature Selection Deep Architecture Acc. XAI [23] Seizure Detection Raw, Spectral, temporal, EEG snapshot, spectrogram FCNN, RNN, DNN 99.7% No [28] [29] Seizure Detection Seizure Detection Raw Data Raw Data CNN CNN, Attention, FCNN 98.05% - Yes Yes [31] Seizure Prediction Raw data AE+ BiLstm 99.66% No [32] Seizure Detection Directed Connectivity and Graph Metrics DNN 99.43% No [37] Video Detection Trajectory Features DNN 93.33% Yes [39] Schizophrenia Detection Mixed Connectivity CNN 91.69% No [41] Seizure Detection Raw Data CNN, BiLstm 98.89% No Proposed Method Seizure Detection 7 Connectivity Measures CNN, BiLstm, Attention, FCNN 97.03% Yes approach for epilepsy detection uses the raw EEG data as input to a bidirectional recurrent network that uses the past window knowledge to predict the next window’s label. The method can discriminate normal-ictal and normal-ictal-interictal EEG signals accurately [41]. New techniques are currently developing and shaping into a field of study called explainable AI (XAI) [24] where researchers try to make use of the learned blocks of information inside the deep learning models. A modified deep learning model will also learn explainable features while training. The first efforts employed the deep CNN using deconvolution methods to explain the feature maps. Parts of the image that activated certain neurons were marked during the process [33]. Another XAI algorithm, LIME (Local Interpretable Model-Agnostic Explanations), finds for every test sample the relevance of a particular learned feature for a specific output using a local approximation with a linear sparse model [34]. Class activation mapping (CAM) is another method for saliency map generation primarily used for object localization in the image. CNN with a final Global average pooling (GAP) layer was constructed and the weights of the GAP layer were employed to compute heat maps showing the localization of objects in an image [35].
  • 5. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 5 Better results were obtained with Grad-CAM++ which is a variation of CAM that uses the gradient of the output class with respect to the activation of the feature maps to find better saliency maps[36]. Roy et al. proposed a task aware selection of features by learning a deep neural network (DNN) for action recognition using video as input. They extract a 426 trajectory and motion features, and after learning the DNN, they study the activation potential normalized over all layers to quantify feature relevance. The authors employ the first layer activation functions to define a contribution measure for each feature [37]. Feature relevance was also studied during Parkinson disease using data mining techniques [38]. Adversarial representation learning methods were employed for robust general seizure detection models. In one study, a deep CNN model that employs 2-second window raw data as input was analysed where the authors employ the weights of the learned model to visualize internal functions of the network and extract feature maps. Using the maps as receptive fields in the intermediate layers, they investigate domain-specific knowledge and class-discriminative features using correlation maps in different frequency bands which were further processed to construct scalp topographies [29]. None of these methods can provide deeper insights about the data. Our proposed method is designed to keep track of the used features and provide explainability of the measures used. The features that trigger the decision of the model will be revealed and this will provide valuable information that can be related to the type of disease or type of interactions during seizure and non-seizure episodes. We will also show that different seizure patients exhibit diverse feature relevance maps which can be used for further analysis. 3. PROPOSED METHODOLOGY 3.1. Overview of the method As depicted in figure 1, epileptic EEG data is segmented into short-duration fixed length windows from which we extract the five common brain rhythms δ (2−4 Hz), θ (4−8 Hz), α (8−13 Hz), β (13−30 Hz), γ (greater than 30Hz) using Butterworth bandpass filters. For each of these rhythms, we compute seven connectivity measures: Spectral matrix, inverse of spectral matrix, coherence, partial coherence, directed coherence, partial directed coherence, and phase-locking value. Figure 1. The overall Workflow
  • 6. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 6 These connectivity measures are arranged in a tensor and fed as features to a deep seizure detection network. We evaluate and compare four different deep learning models by manipulating the fusion mechanism. Since our features represent different perspective connectivity measures, we intend to benefit from the rich information found in these features. Similar to [37], we use the activation values of the learned model to output the type of connectivity that quantifies participation percentage in making the model make a certain decision. 3.2. Data Preparation and Processing The raw data were first pre-processed to extract 20-seconds long windows of seizure and non- seizure data which were manually extracted using the labels provided with the data. Each of these signals is divided further into ten 2-seconds long sub-windows resulting in a sequence of 10 seizure sub-windows that are fed to bandpass filters to extract five rhythms for each. 3.3. Feature Extraction Taking our channels as a multivariate process , the multivariate linear shift-invariant filter representation can be expressed by: (1) where is a vector of zero mean inputs and is matrix representing a filter impulse response. On the other hand, the multivariate autoregressive (MVAR) model of order can be expressed as: (2) where can be considered as uncorrelated zero mean Gaussian noise. This model can allow defining interactions between different signals such as coupling and causality using the matrices since the term quantifies the causal linear interactions between and at lag k. In the frequency domain, using the Fourier transform, the above equations yield and . By comparing the two spectral representations above, one can derive the following relation: . The cross-spectral density matrix S(f) and its inverse P(f) are defined by and where the superscript represents the Hermitian transpose and represents the covariance of The coherence between the two signals and at a frequency f is can now be derived as: (3)
  • 7. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 7 while the directed coherence can be expressed by: (4) where represents the variance of signal . The partial coherence can also be derived in a similar fashion and can be represented by: (5) from which we can infer the partial directed coherence, PDC, defined by: (6) On the other hand, phase locking value is a measure that quantifies the synchrony between two signals. The signals and are first band pass filtered in the specified frequency bands, and then Hilbert transform is applied to extract the corresponding phases and . The phase locking value (PLV) can be expressed as: (7) where N is the number of samples considered per window. 3.4. Deep Learning Model Architectures The data formed is in the form of a tensor of dimension 7x10x19x19x5. Each data sample takes in 10-time windows of the 7 connectivity features of 2-second intervals, each window is represented by a 19x19x5 matrix where the third dimension represents the frequency band and 19x19 is the connectivity matrix. Each row and column represent an EEG channel so one entry in this matrix represents the connectivity of the channel along the row with the channel along the column in one of the 5 different frequency bands. In our four architectures, all convolution operations use one filter because we are trying to capture a numerical combination of the input matrix and not a characteristic (i.e. shade or edges) since it’s not an image. Furthermore, they have a similar base architecture, but they use different schemes to combine the different features. For our first architecture shown in figure 2, we separate the 7 features and the five frequency bands of each feature, resulting into 35 independent inputs, and feed them into separate, identical blocks as depicted in figure 2A. We process each input as a time series so each window passes through one 2D convolution layer of kernel size of (19,1) . This operation helps us to condense all the relations that a channel has with the other into one number. Since we obtained one feature vector per window, we feed them to an LSTM block followed by a self-attention layer and fully
  • 8. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 8 connected layers. Each one of the 35 inputs pass through this block and at the end, the vectors obtained from the last FCNN layers are concatenated and fed to an FCNN layer for classification as shown in figure 2B. This fusion scheme assumes total independence of the features and frequency bands, the feature vectors are concatenated before the last FCNN layer. The second fusion scheme, instead of combining the feature vectors in the last layer, we combine the vectors obtained after the attention layer of each feature and feed them to FCNN layers as shown in figure 3. For the third fusion scheme, after having a feature vector of size 19 for each window for each feature we concatenate the feature vectors of similar time steps together, obtaining a 19x7 matrix then we proceed with a 2D convolution layer of kernel size (1,19) to get one feature vector of size 19. We obtain channel-wise combination of the different features which we feed into an LSTM block followed by an attention layer and FCNN layers (figure 4). Figure 5 depicts the fourth fusion scheme, after having obtained a 19x19 matrix for each window of each feature we concatenate the feature matrices of similar time steps together, obtaining a 19x19x7 matrix for each time step then we proceed with a 3D convolution layer of kernel size (1,1,7) to get a 19x19 matrix in order to get a frequency wise combination of the different features and then proceed with a 2D convolution layer of kernel size (1,19) followed Figure 2. Model 1 Architecture
  • 9. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 9 Figure 3. Model 2 Architecture Figure 4. Model 3 architecture
  • 10. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 10 Figure 5. Model 4 architecture by an LSTM block, an Attention layer, and FCNN layers. Thus, our 4 fusion schemes aim at reducing the level at which relationships between the features are the strongest. The first scheme assumes no relation between features and frequencies, the second assumes a high-level relation, the third assumes a channel-wise relation and the fourth one assumes a frequency wise relation. 3.5. Feature Relevance For the XAI part, we planned to address one of the fundamental neuroscientific questions concerned in finding relevant temporal and spatial scales necessary for given behaviour [28] by doing a statistical study for the relationships between the derived connectivity measures varying between spectral, causal, and phase-related, that characterize the brain signals on one hand and the seizure-nonseizure for patient-specific and cross-patient cases on the other hand. We aim at finding some link between the explained features contribution results to some of the scientific facts concerning the seizure detection and neurology fields. To apply XAI, we first targeted our research on the pre-processing and extracting different connectivity measures (SM, ISM, DC, C, PDC, PC, PLV) across different frequency bands ) that we believe, based on some prior knowledge and experimental studies, have a direct impact on seizure detection [13][25][26][27]. Then, we tried to achieve our explainability in the post-modelling stage using the input-based explanation drivers methods where we base our feature study on the output predictions [42]. As we see in the structure of model 2, we tried separating each feature’s CNN-LSTM unit alone and concatenate them at a later stage, in such a way we can simply do our study using the concatenation and the first dense layer only. This allows us to keep track of the features. The concatenation layer will combine the outputs from the 7 features separated CNN-LSTM units into one flatten layer as depicted in figure 6 as vector . Our further investigation was based on the feature extraction paper [37] at the first dense layer. To get the relevance of each of the input neurons of the flatten layer, we first calculated the average absolute activation potential contributed by the dimension of the input: (8)
  • 11. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 11 where the activation is: . Then, we find the relative contribution of the input dimension towards the activation of the hidden neuron: (9) In order to get the total net contribution is of an input dimension overall hidden layers, we then computed the net of all ’s for every input overall : (10) Since, our input is a set of neurons per feature in order of the sequence SM, ISM, DC, C, GDPC, PC, PLV respectively, then we further summed each of that sets alone to get the net per feature. According to the feature extraction paper, the higher the contribution of an input dimension, the more likely it is its participation in hidden neuronal activity and consequently, classification [37]. Figure 6. Last layer architecture weights and activations from which relevance of features are computed To begin our study, we first input the whole pre-processed dataset, cross-patient study, for seizure and non-seizure cases separately in the first part of the model, the layers before the concatenation part, and extracted the embeddings or the 400 neuron per feature and then got our results according to the technique described above. The analysis is further extended to find inter-patient feature relevance variations.
  • 12. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 12 4. EXPERIMENTAL RESULTS 4.1. Experimental Setup CHB-MIT is an EEG dataset collected at Children’s Hospital in Boston. It has 24 cases of epilepsy. EEG was acquired using the international 10-20 system sampled at 256 samples per second with a 16-bit resolution. Overall 198 seizures are annotated with the beginning and the end of seizures. This particular dataset was chosen because it was used in many state of the art papers which allows us to compare our deep learning model’s performance. As shown in figure 7, we extract from this dataset 20 second intervals from seizure and non-seizure episodes. The seizure of length less than 20 seconds, which were very few, were ignored in this study and the seizure episodes of duration greater than 20 seconds were dissected into 20 second intervals. The remainders were also considered by taking 20 seconds from the end. As for the non-seizure episodes, four 20-second intervals were taken randomly from every dataset. The total number of 20 episode intervals collected contained 543 seizure intervals and 801 non-seizure intervals. 4.2. Validation Metrics To evaluate the performance of the models, we use the statistical measures of binary classification. We denote the seizure label as the positive class and the non-seizure case as the negative class. We first define the following terms by  True Positive (TP) : number of hits, correctly classified positives  False Positive (FP): number false alarms, classified as seizure while it actually has no seizure.  False Negative (FN): number of misses, classified as non-seizure while it actually is seizure.  True Negative (TN): number of correct rejections, correctly classified negatives. Sensitivity measures the percentage of positive class members that are correctly identified and is given by: (11) Specificity gives the percentage of negative class members that are correctly identified whose formula is: (12) Precision finds the positive predictive rate which explains how much of the positives were identified. (13) Finally, the accuracy is computed using: (14)
  • 13. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 13 4.3. Feature Extraction Each of these 20-second intervals was first filtered in the five frequency bands described above and further dissected into ten 2-second intervals each of which is processed to extract the seven described features. The ten sub-intervals are then gathered into a single tensor resulting in a tensor of size 7x10x19x19x5 as shown in figure 7. Figure 7. Data preparation and feature extraction 4.4. Performance Results of the proposed models The models were evaluated on a data split of 85% training and 15% testing. After fine tuning, the first model performed the best with a 97.03% accuracy as you can see in Table 2. The results also show that the earlier the fusion scheme the weaker the relation between the features. Since our data dissection method is totally new, no quantitative comparative analysis was performed. We train all our models twice, the first time non-seizure data has a label of 1 and seizure data has a label of 0 and the second time, using the resulting weights from the first time, the labels are flipped, that procedure forces the model to learn more robust features for each class. The data was too big to load into memory, so we had to write our own data generator that fetches batches of 32 samples from the folder. The hyper parameters for each model are as follows:  Model1: a relu activation function for both dense and convolution layers, 2 LSTM layers and 1 dense layers of 100 neurons for each block and a dropout rate of 0.5. The optimizer used is Adam and 30 epochs for both training phases.  Model2: a relu activation function for both dense and convolution layers with a Spatial Dropout2D of 0.07, we use 2 LSTM layers and all dense layers have Dropout of 0.5. We use 2 dense layers of 263 and 20 neurons respectively. The optimizer used for the first training phase was RMSprop for 17 epochs, after the flip Nadam was used as optimizer for 8 epochs.
  • 14. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 14  Model3: a relu activation function for both dense and convolution layers, 2 LSTM layers and 3 dense layers of 500 neurons for each block and a dropout rate of 0.5 and l2 regularization. The optimizer used is Adam and 80 epochs for both training phases.  Model4: a relu activation function for both dense and convolution layers, 2 LSTM layers and 3 dense layers of 200 neurons for each block and a dropout rate of 0.3 and l2 regularization. The optimizer used is Adam and 100 epochs for both training phases. The performance of the different modes on training and testing data are tabulated in table II. The results are the average of ten runs with different splits each time. The overall accuracy, sensitivity, specificity, and the precision are shown. While the model is well fit to the training data in all models, unseen testing data performance is considered to choose the best model. Model 1 performs the best in all measures recording a sensitivity of 97.65%, a specificity of 96.58%, precision 95.4%, and an overall accuracy of 97.03%. Table 2. Performance Metrics Across all proposed models Data Sensitivity Specificity Precision Accuracy Model 1 Training 100.00 99.85 99.78 99.91 Model 1 Testing 97.65 96.58 95.40 97.03 Model 2 Training 99.13 98.10 97.22 98.51 Model 2 Testing 94.67 96.06 93.42 95.54 Model 3 Training 99.79 100.00 100.00 99.91 Model 3 Testing 80.23 92.24 88.46 87.13 Model 4 Training 96.08 96.49 94.84 96.32 Model 4 Testing 77.22 86.18 78.21 82.67 4.5. Feature Relevance Results Figure 8 represents the overall feature relevance of all the data. We can see that spectral matrix and partial coherence have higher relevance during seizures on average while non-seizure decisions are made more effective with the coherence and the partial directed coherence. Figure 8. Average feature relevance across all data
  • 15. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 15 Cross-patient test results show the variation of the feature relevance diagrams for each of the tests as can be seen in figures 9 and 10. We notice that every patient had different feature relevance plot than the other, such that there is a different way in assigning the relevance and weights of each feature which can be interpreted as a validation of the scientific fact that EEG patterns in seizure patients are highly variable across patients. On the other hand, there were some changes across the same patient results which can be explained on basing the features on both time and frequency domains and EEG seizure pattern data are highly dynamic in nature for the same patient. Fig. 9. Feature relevance for patient 15 Figure. 10 Feature relevance for patient 20 Another interesting finding is that the directed coherence feature is often assigned with the least net contribution The phase-locking value is often assigned with good relevance value in seizure and non-seizure compared to other features which might be an indication for the fact that PLV is one of the undirected connectivity measures that can infer the disconnection of the parts of the brain where the seizure onset zones get disconnected [26].
  • 16. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 16 5. DISCUSSION AND FUTURE WORK EEG data analysis has been studied using EEG data and many of the techniques rely on the data dissection and the features used. In this study, we adopted a sequential portion of EEG data which contains information about a sequence of ten consecutive EEG sub-windows. This choice was made because during seizures, the signals may alternate from seizure to non-seizure states. Also, during non-seizure data, short bursts of seizure activity may arise and can be labelled as seizure if taken separately. The BiLstm structure between these ten windows can learn relationships between the windows during a seizure and avoid such misclassifications. Besides, seizure data is generally noisy and stochastic in nature, and having relating higher-level information between consecutive windows can help learn more complex relationships across time. Our study focuses on discriminating seizure/non-seizure while providing explanations of our learned model which make it more presented different CNN-Lstm models for detecting seizure based on long windows. Our models used different fusion strategies where the first two models combine the features at the decision level, and the two combined them at the input level. We were able to show that combining features at the decision layers yield in a much better performance which can be an indication that the features are better learned when separated in the feature extraction part. In our study, we made use of the latest techniques and advancement mainly in the regularization and the normalization methods that helped our architecture in achieving better results. We employed various fusion mechanisms and conclude that fusion at the input is not performing well compared to fusion the features at the end which makes CNN able to learn and extract its high order representation better. Feature extraction for all the data windows considered in this study was computationally very expensive and took few days. This can be accelerated using GPU programming and distributed and parallel computing methodologies. Many extensions are possible at the level of architecture, feature selection, and explainablity. We can investigate different architectures to capture other relations, since the assumption that we started from in this study is that the smallest input possible is a window of the same feature in different frequencies, we would have to investigate if the smallest possible input window of the same frequency in different features yields better accuracy showing that the seizure is more related at the frequency level rather than the feature level and we will be able to do XAI to deduce which frequency band is the most important. Our methods need to encompass other EEG datasets to find more general models and analyse other seizure patient biomarkers. Finally, our methods can be extended to learn sequence relationships at transition episodes where states shift from pre-ictal to ictal as well as ictal to post-ictal transitions to characterize the state transitions during seizures. Finally, we explained the relevance of each feature to characterize the dynamics during seizures. In the future, these values can be extended to capture the relevant features in specific frequency bands thus providing spectral information along with the type of connectivity between channels and hence more explanation about the seizures. 6. CONCLUSION Seizure detection using a sequence model was proposed in this study. We have shown that relating higher resolution data together as a sequence can characterize differences between seizure and non-seizure data. Among the four studied models based on deep learning, model 1 which used fusion at the level of the decision recorded 97.03% accuracy, 97.65% sensitivity, 96.58% specificity and a 95.40% precision. The models were based on different neural network
  • 17. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 17 architectures mainly on CNN and LSTM with attention layers. The learned weights of the model helped understand the relevance of the chosen features and further showed that they can represent cross-patient discriminative features and open way to many future studies for seizure analysis. REFERENCES [1] World Health Organization. (2006). Neurological disorders: public health challenges. World Health Organization. [2] Truccolo, W., Ahmed, O. J., Harrison, M. T., Eskandar, E. N., Cosgrove, G. R., Madsen, J. R., ... & Cash, S. S. (2014). Neuronal ensemble synchrony during human focal seizures. Journal of Neuroscience, 34(30), 9927-9944. [3] Alotaiby, T. N., Alshebeili, S. A., Alshawi, T., Ahmad, I., & El-Samie, F. E. A. (2014). EEG seizure detection and prediction algorithms: a survey. EURASIP Journal on Advances in Signal Processing, 2014(1), 183. [4] Zaylaa, A. J., Harb, A., Khatib, F. I., Nahas, Z., & Karameh, F. N. (2015, September). Entropy complexity analysis of electroencephalographic signals during pre-ictal, seizure and post-ictal brain events. In 2015 International Conference on Advances in Biomedical Engineering (ICABME) (pp. 134-137). IEEE. [5] Carney, P. R., Myers, S., & Geyer, J. D. (2011). Seizure prediction: methods. Epilepsy & behavior, 22, S94-S101. [6] Friston, K. J. (2011). Functional and effective connectivity: a review. Brain connectivity, 1(1), 13-36. [7] Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005). Nonlinear multivariate analysis of neurophysiological signals. Progress in neurobiology, 77(1-2), 1-37. [8] Bastos, A. M., & Schoffelen, J. M. (2016). A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Frontiers in systems neuroscience, 9, 175. [9] Kovach, C. K. (2017). A biased look at phase locking: Brief critical review and proposed remedy. IEEE Transactions on signal processing, 65(17), 4468-4480. [10] Van Mierlo, P., Papadopoulou, M., Carrette, E., Boon, P., Vandenberghe, S., Vonck, K., & Marinazzo, D. (2014). Functional brain connectivity from EEG in epilepsy: Seizure prediction and epileptogenic focus localization. Progress in neurobiology, 121, 19-35. [11] Thorniley, J. (2011). An improved transfer entropy method for establishing causal effects in synchronizing oscillators. In ECAL (pp. 797-804). [12] Battaglia, D., Witt, A., Wolf, F., & Geisel, T. (2012). Dynamic effective connectivity of inter-areal brain circuits. PLoS computational biology, 8(3). [13] Brázdil, M., Halámek, J., Jurák, P., Daniel, P., Kuba, R., Chrastina, J. & Rektor, I. (2010). Interictal high-frequency oscillations indicate seizure onset zone in patients with focal cortical dysplasia. Epilepsy research, 90(1-2), 28-32. [14] B.J. Edelman, N. Johnson, A. Sohrabpour, S. Tong, N. Thakor, B. He, Systems neuroengineering: understanding and interacting with the brain, Engineering. 1 (2015) 292–308. doi:10.15302/j- eng2015078. [15] P. Van Mierlo, E. Carrette, H. Hallez, R. Raedt, A. Meurs, S. Vandenberghe, D. Van Roost, P. Boon, S. Staelens, K. Vonck, Ictal-onset localization through connectivity analysis of intracranial EEG signals in patients with refractory epilepsy, Epilepsia. 54 (2013) 1409–1418. doi:10.1111/epi.12206 [16] Bandarabadi, M., Teixeira, C. A., Rasekhi, J., & Dourado, A. (2015). Epileptic seizure prediction using relative spectral power features. Clinical Neurophysiology, 126(2), 237-248. [17] Usman, S. M., Usman, M., & Fong, S. (2017). Epileptic seizures prediction using machine learning methods. Computational and mathematical methods in medicine, 2017. [18] Wang, H. E., Bénar, C. G., Quilichini, P. P., Friston, K. J., Jirsa, V. K., & Bernard, C. (2014). A systematic framework for functional connectivity measures. Frontiers in neuroscience, 8, 405. [19] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of parkinson disease tele-monitoring data through data mining techniques. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3). [20] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. [21] Tsiouris, Κ. Μ., Pezoulas, V. C., Zervakis, M., Konitsiotis, S., Koutsouris, D. D., & Fotiadis, D. I. (2018). A Long Short-Term Memory deep learning network for the prediction of epileptic seizures using EEG signals. Computers in biology and medicine, 99, 24-37.
  • 18. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 18 [22] Cho, K. O., & Jang, H. J. (2020). Comparison of different input modalities and network structures for deep learning-based seizure detection. Scientific Reports, 10(1), 1-11. [23] Boonyakitanont, P., Lek-uthai, A., Chomtho, K., & Songsiri, J. (2019). A Comparison of Deep Neural Networks for Seizure Detection in EEG Signals. bioRxiv, 702654. [24] Gunning, D. (2017). Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2. [25] Weiss, S. A., Lemesiou, A., Connors, R., Banks, G. P., McKhann, G. M., Goodman, R. R., & Diehl, B. (2015). Seizure localization using ictal phase-locked high gamma: a retrospective surgical outcome study. Neurology, 84(23), 2320-2328. [26] Myers, M. H., Padmanabha, A., Hossain, G., de Jongh Curry, A. L., & Blaha, C. D. (2016). Seizure prediction and detection via phase and amplitude lock values. Frontiers in human neuroscience, 10, 80. [27] Douw, L., van Dellen, E., de Groot, M., Heimans, J. J., Klein, M., Stam, C. J., & Reijneveld, J. C. (2010). Epilepsy is related to theta band brain connectivity and network topology in brain tumor patients. BMC neuroscience, 11(1), 103. [28] Hossain, M. S., Amin, S. U., Alsulaiman, M., & Muhammad, G. (2019). Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(1s), 1-17. [29] Zhang, X., Yao, L., Dong, M., Liu, Z., Zhang, Y., & Li, Y. (2019). Adversarial Representation Learning for Robust Patient-Independent Epileptic Seizure Detection. arXiv preprint arXiv:1909.10868 [30] Li, Q., Chen, Y., Wei, Y., Chen, S., Ma, L., He, Z., & Chen, Z. (2017). Functional Network Connectivity Patterns between Idiopathic Generalized Epilepsy with Myoclonic and Absence Seizures. Frontiers in computational neuroscience, 11, 38. [31] Daoud, H., & Bayoumi, M. A. (2019). Efficient epileptic seizure prediction based on deep learning. IEEE transactions on biomedical circuits and systems, 13(5), 804-813. [32] Akbarian, B., & Erfanian, A. (2019). A framework for seizure detection using effective connectivity, graph theory and deep modular neural networks. arXiv preprint arXiv:1909.03091 [33] Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818-833). Springer, Cham. [34] Meka, A., Maximov, M., Zollhoefer, M., Chatterjee, A., Seidel, H. P., Richardt, C., & Theobalt, C. (2018). Lime: Live intrinsic material estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6315-6324). [35] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921-2929). [36] Chattopadhyay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2017). Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. arXiv preprint arXiv:1710.11063. [37] Roy, D., Murty, K. S. R., & Mohan, C. K. (2015, July). Feature selection using deep neural networks. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE. [38] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of parkinson disease tele-monitoring data through data mining techniques. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3). [39] Phang, Chun-Ren & Numan, Fuad & Hussain, Hadri & Ting, Chee-Ming & Ombao, Hernando. (2019). A Multi-Domain Connectome Convolutional Neural Network for Identifying Schizophrenia from EEG Connectivity Patterns. IEEE Journal of Biomedical and Health Informatics. PP. 1-1. 10.1109/JBHI.2019.2941222. [40] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). “Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). [41] Abdelhameed, A. M., Daoud, H. G., & Bayoumi, M. (2018, June). Deep convolutional bidirectional LSTM recurrent neural network for epileptic seizure detection. In 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS) (pp. 139-143). IEEE. [42] Fellous, J. M., Sapiro, G., Rossi, A., Mayberg, H. S., & Ferrante, M. (2019). Explainable Artificial Intelligence for Neuroscience: Behavioral Neurostimulation. Frontiers in Neuroscience, 13, 1346.
  • 19. International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021 19 AUTHORS Hmayag Partamian is a Ph.D. candidate in the Electrical and Computer Engineering (ECE) department at the American University of Beirut (AUB). He graduated with a B.E. in electrical and computer engineering from the Lebanese University in 2005 and masters and a M.S. in computational science from AUB in 2014. His research encompasses diverse signal analysis and machine learning algorithms such as seismic data analysis, biomedical signal analysis (EEG and ECG), image processing, and vibration analysis. His current research encompasses modeling of the brain during seizures using EEG data and developing detection and prediction models using decomposition techniques and machine learning algorithms. Fouad Khnaisser a recent computer and communication engineering graduate from the American University of Beirut. He first started as a research assistant in the AUB Mind Lab in his third year where he focused on analysing and classifying speech for different purposes like emotional speech classification and reproduction. Mohamad Mansour is a recent computer and communication engineering graduate from the American University of Beirut. He has been part of the Socially Competent Robotic and Agent Technologies - research group at CYENS Centre of Excellence in Cyprus in the Explainable AI field. His research interests are natural language processing, computer vision, reinforcement learning, and robotics. Reem A. Mahmoud is a Ph.D. candidate in the Electrical and Computer Engineering (ECE) department at the American University of Beirut (AUB). She graduated with a B.S. in Electrical Engineering with high distinction from Alfaisal University in Riyadh, Saudi Arabia, and an M.E. from AUB in 2015 and 2017, respectively. Her main area of research is theoretical machine learning with a focus on learning from limited time- series data. Her interests also extend to knowledge transferability and personalization in machine learning. Hazem Hajj is an Associate Professor with the American University of Beirut (AUB). He is a senior member of IEEE and ACM. Over the years, Hazem has established leadership in the field of artificial intelligence (AI) building on a strong mix of industry and academics experiences at Intel Corporation and AUB. He received his PhD from the University of Wisconsin-Madison in 1996, and his Bachelor from AUB with distinction. Over the years, Hazem has been the recipient of numerous academic and industry awards. His research interests include Artificial Intelligence (AI), Machine Learning and Energy-Aware Computing, with special interests in Natural Language Processing and Context Aware Sensing. His research has produced over 100 publications in the AI field in addition to multiple patents and awards. His research has been funded by local and international funding sources, including funding from Intel Corporation and Qatar National Research Fund (QNRF). Fadi N. Karameh is an Associate Professor in the Electrical and Computer Engineering Department at the American University of Beirut (AUB) in Beirut, Lebanon. Prof Karameh joined AUB in 2003 shortly after graduating from the Laboratory of Information and Decision Systems at the Massachusetts Institute of Technology (MIT) in Cambridge, USA. His research includes system-theoretic approaches in identification, estimation and signal processing in electrical engineering, with an emphasis on neurophysiological signals and systems. His interdisciplinary interests include developing identification and estimation tools for understanding nonlinear dynamic large-scale interactions in brain cortical networks from multichannel electrical activity recordings.