8421ijbes01

International Journal of Biomedical Engineering and Science (IJBES), Vol. 8, No. 1/2/3/4, October 2021
DOI: 10.5121/ijbes.2021.8401 1
A DEEP MODEL FOR EEG SEIZURE
DETECTION WITH EXPLAINABLE AI USING
CONNECTIVITY FEATURES
Hmayag Partamian, Fouad Khnaisser, Mohamad Mansour,
Reem Mahmoud, Hazem Hajj and Fadi Karameh
Department of Electrical and Computer Engineering,
American University of Beirut, Beirut, Lebanon
ABSTRACT
During seizures, different types of communication between different parts of the brain are characterized by
many state of the art connectivity measures. We propose to employ a set of undirected (spectral matrix, the
inverse of the spectral matrix, coherence, partial coherence, and phase-locking value) and directed
features (directed coherence, the partial directed coherence) to detect seizures using a deep neural
network. Taking our data as a sequence of ten sub-windows, an optimal deep sequence learning
architecture using attention, CNN, BiLstm, and fully connected neural networks is designed to output the
detection label and the relevance of the features. The relevance is computed using the weights of the model
in the activation values of the receptive fields at a particular layer. The best model resulted in 97.03%
accuracy using balanced MIT-BIH data subset. Finally, an analysis of the relevance of the features is
reported.
KEYWORDS
Seizure detection, deep sequence learning, brain connectivity, explainability, feature relevance.
1. INTRODUCTION
Epilepsy is a neurological disorder that affects around 50 million people of all ages worldwide
[1]. It is characterized by the frequent and repetitive occurrence of seizures that disrupt normal
function and affect the quality of life of the patient.
Neuroimaging techniques acquired from epilepsy patients showed evidence of different structural
and functional irregularities. Synchronous discharges of electrical activity across different parts
of the brain during seizures were captured using electroencephalogram (EEG) data [2]. To extract
meaningful information, a plethora of measures have been designed to characterize the changes
in the EEG signals during epilepsy that can be classified into two types. Univariate metrics
measure the information in a window of one time series and can be classified as temporal,
spectral, and entropy-based measures [3] [4] [5]. On the other hand, multivariate connectivity
metrics [6] [7] [8], characterize information between multiple time series and can be directed
(effective) or undirected (functional). For example, the phase-locking value (PLV) is an
undirected measure since it quantifies the synchrony between a pair of signals [9]. Granger
Causality (GC) calculates the amount of information transfer from one channel to the other which
makes it a directed measure [10] [11] [12]. Cross-frequency coupling (CFC) methods have also
been used in characterizing information between different brain regions by studying the
interaction between oscillations across different frequency bands such as phase-amplitude
coupling (PAM) [13].

2
Ictal episodes are those that exhibit seizure activity and will have different connectivity values
when compared to non-ictal periods [14]. Also, seizure onset zones usually get isolated in their
activity before the seizure starts which can be captured using a coherence connectivity matrix
[15]. Electrophysiological research also reports that functional connectivity analysis allows
localization of seizure onset zones (SOZ), whose exact location helps increase surgery success
rates [3]. Besides, variations in phase-locking value (PLV) can characterize the synchronous
activity between different parts of the brain during a seizure and non-seizure episodes [9].
In the last decade, advances in technology have made machine learning and big data analysis
available and many seizure detection algorithms were developed using diverse machine learning
algorithms. Support vector machines (SVM) is one of the common techniques used to learn
classification algorithms for seizure detection [16] [17] [3]. Usually, many classical tailored
features are fed to the SVM classifiers. Therefore, employing unnecessary features may hinder
both the performance and the speed of analysis [18]. Feature selection techniques were proposed
to reduce the feature set since machine learning algorithms often perform better when relevant
features are selected and used for learning [19]. An alternative branch of machine learning is
deep learning (DL) which studies multiple-layer architectures of neural networks that can learn
discriminating features and a classifier simultaneously [20]. The raw EEG data, snapshot images,
different univariate and bivariate measures were used as input to deep classifiers to detect
seizures and non-seizure episodes during epilepsy [21] [22] [23]. However, deep networks are
perceived as “black box” techniques since the role of the different layers and the overall internal
functioning are unknown. A major issue arises because scientists need to understand why such a
decision was made. Can such models be trusted without knowing why they fail and why they
succeed? Explainable artificial intelligence (XAI) [24] techniques try to derive explanations
from the parameters of the deep network to infer knowledge and build explainable features [13].
In epilepsy analysis, seizure detection is an important problem, however, for doctors to be able to
diagnose epilepsy, they need deeper information about the interactions of the brain regions such
as the localization of SOZ which can help identify a resection area for surgery. Experimental
studies have shown the phase-amplitude coupling [13] and the power in the high-frequency band
(>100Hz) [25] increase in the seizure onset zone and during the seizures. The SOZ also gets
disconnected from the rest of the brain regions that can be inferred using undirected connectivity
measures such as PLV [26]. Another study shows that the phase-lag index (PLI) connectivity in
the θ (4-8 Hz) band can be linked to tumor-related epilepsy [27]. Explainability during seizure
analysis were also addressed to find the feature relevance at different frequency bands [28] and
extract information from the learned weights to derive topographic brain maps [28] [29].
In this study, we selected a subset of the different connectivity measures to characterize the brain
signals from different perspectives. spectral matrix (SM), the inverse of the spectral matrix (IS),
coherence (COH), partial coherence (PC), and phase-locking value (PLV) as undirected
measures, and directed coherence (DC), and the partial directed coherence (PDC) as directed
features. We feed the chosen features to learn a deep model that not only classifies seizures but
also computes how much each of the features has participated in the decision made by the
detector using the deep learning model parameters. The weights of the model provide
explainability at the level of the features. This is of key importance for epilepsy analysis since it
provides the user with additional valuable information. Also, seizure events differ from one user
to the other and the model will be generally trained using different patient data that have different
seizure dynamics. The types of epilepsy in the training set can also be different and may exhibit
different connectivity values [30]. Therefore, when new data is tested, the proposed model will
detect whether a specific interval of EEG is a part of a seizure or not and also outputs the
percentage of the impact of each of the employed connectivity measures on the model’s decision.

3
We summarize our contributions as follows:
 Unlike classical methods where they use a single window, our model considers a 20-
second window with ten 2 second sub-windows to characterize interdependencies
between these windows through time and better describe seizure and non-seizure
intervals of the EEG data.
 The model is designed to perform seizure detection of the 20-second data using attention,
convolutional neural networks (CNN), fully connected layers (FCNN), and bidirectional
long short term memory (BiLstm) methodologies which is a type of a recurrent neural
network (RNN).
 In [37], different fusion methods were employed while building the architecture of the
models. We also employed different fusion methods and compared them to understand
relationships between features and the output.
 Finally, we infer from the weights of the network the relevance of the connectivity
measures on the decision made by the detector. We study the impact of the features on
the output generally across all patients as well as per patients. To the best of our
knowledge, no other work has studied explainability with connectivity analysis during
seizure classification.
The rest of the paper is organized as follows. In section 2, we present all the related work for this
study. In section 3, we explain our methodology providing the different designs and workflows of
the proposed method. In section 4, a series of experiments are conducted to evaluate the
performance of our design. In section 5, we discuss our findings, its limitations, and propose
possible future extensions. We finally conclude in section 6, providing a summary of our work.
2. RELATED WORK
Two tasks arise while working with epileptic data, classifying the sample as ictal or non-ictal or
classifying the window as preictal (the period that precedes an ictal phase) or inter-ictal (period of
normal activity). Even though we will tackle the ictal/non-ictal classification problem, studying
the preictal/inter-ictal problem is useful as it gives us intuition and inspiration to develop our
architecture. The state of the art for preictal and inter-ictal classification is produced by Daoud et
al. [31] where they describe a methodology to train, using the raw multichannel EEG data, a deep
convolutional autoencoder (DCAE) and uses the latent space representation i.e. (output of the
pre-trained encoder) of each recording as input to a BiLstm network that classifies the example
into preictal and inter-ictal. The Dataset used is the CHB-MIT EEG dataset recorded at
Children’s Hospital Boston and is publicly available. To narrow down the channels considered
(23 total channels), an iterative algorithm was used to select the channels. The algorithm
calculates the product between variance and entropy for each channel and iteratively trains the
model on bigger windows until all the channels are considered to select the best combination of
performance accuracy and computational cost. Through this approach, an accuracy of 99.66% is
achieved.
As for ictal/non-ictal prediction, Akbarian et al. [32] use effective brain connectivity, the directed
transfer function (DTF), the directed coherence (DC), and the generalized partial directed
coherence (GPDC) in various frequency bands, to measure the relation between brain regions.
They extracted features from each of these measures using graph theory. They fed each set of
newly extracted features to an autoencoder (AE) for feature reduction then used a softmax on the
output of the encoder, obtaining 3 classifiers. The decision of ictal/non-ictal is then inferred
through majority voting. Through this method and using the CHB-MIT dataset, they achieved a
99.43% accuracy.

4
Table 1 presents an overview of different architectures and their respective task-specific
accuracies. We also include the features used by each study. When using different features we
need to combine them to feed them to the network, different types of combinations/fusions exist
such as concatenating them at the input layer, or feeding them through separate networks and
concatenating their respective outputs and use them as input to the final output layer. Fusion
methods were applied in EEG with the Schizophrenia detection problem where they showcase the
effect of varying fusion mechanisms on the performance of the deep network [39]. Another
Table 1. Comparable Analysis
Ref. Task Feature
Selection
Deep Architecture
Acc. XAI
[23] Seizure Detection Raw, Spectral,
temporal, EEG
snapshot, spectrogram
FCNN, RNN, DNN
99.7% No
[28]
[29]
Seizure Detection
Seizure Detection
Raw Data
Raw Data
CNN
CNN, Attention,
FCNN
98.05%
-
Yes
Yes
[31] Seizure
Prediction
Raw data AE+ BiLstm 99.66% No
[32] Seizure Detection Directed Connectivity
and Graph Metrics
DNN 99.43% No
[37] Video Detection Trajectory Features DNN 93.33% Yes
[39] Schizophrenia
Detection
Mixed Connectivity CNN 91.69% No
[41] Seizure Detection Raw Data CNN, BiLstm 98.89% No
Proposed
Method
Seizure
Detection
7 Connectivity
Measures
CNN, BiLstm,
Attention, FCNN
97.03% Yes
approach for epilepsy detection uses the raw EEG data as input to a bidirectional recurrent
network that uses the past window knowledge to predict the next window’s label. The method
can discriminate normal-ictal and normal-ictal-interictal EEG signals accurately [41].
New techniques are currently developing and shaping into a field of study called explainable AI
(XAI) [24] where researchers try to make use of the learned blocks of information inside the deep
learning models. A modified deep learning model will also learn explainable features while
training. The first efforts employed the deep CNN using deconvolution methods to explain the
feature maps. Parts of the image that activated certain neurons were marked during the process
[33]. Another XAI algorithm, LIME (Local Interpretable Model-Agnostic Explanations), finds
for every test sample the relevance of a particular learned feature for a specific output using a
local approximation with a linear sparse model [34]. Class activation mapping (CAM) is another
method for saliency map generation primarily used for object localization in the image. CNN
with a final Global average pooling (GAP) layer was constructed and the weights of the GAP
layer were employed to compute heat maps showing the localization of objects in an image [35].

5
Better results were obtained with Grad-CAM++ which is a variation of CAM that uses the
gradient of the output class with respect to the activation of the feature maps to find better
saliency maps[36]. Roy et al. proposed a task aware selection of features by learning a deep
neural network (DNN) for action recognition using video as input. They extract a 426 trajectory
and motion features, and after learning the DNN, they study the activation potential normalized
over all layers to quantify feature relevance. The authors employ the first layer activation
functions to define a contribution measure for each feature [37]. Feature relevance was also
studied during Parkinson disease using data mining techniques [38].
Adversarial representation learning methods were employed for robust general seizure detection
models. In one study, a deep CNN model that employs 2-second window raw data as input was
analysed where the authors employ the weights of the learned model to visualize internal
functions of the network and extract feature maps. Using the maps as receptive fields in the
intermediate layers, they investigate domain-specific knowledge and class-discriminative features
using correlation maps in different frequency bands which were further processed to construct
scalp topographies [29]. None of these methods can provide deeper insights about the data. Our
proposed method is designed to keep track of the used features and provide explainability of the
measures used. The features that trigger the decision of the model will be revealed and this will
provide valuable information that can be related to the type of disease or type of interactions
during seizure and non-seizure episodes. We will also show that different seizure patients exhibit
diverse feature relevance maps which can be used for further analysis.
3. PROPOSED METHODOLOGY
3.1. Overview of the method
As depicted in figure 1, epileptic EEG data is segmented into short-duration fixed length
windows from which we extract the five common brain rhythms δ (2−4 Hz), θ (4−8 Hz), α (8−13
Hz), β (13−30 Hz), γ (greater than 30Hz) using Butterworth bandpass filters. For each of these
rhythms, we compute seven connectivity measures: Spectral matrix, inverse of spectral matrix,
coherence, partial coherence, directed coherence, partial directed coherence, and phase-locking
value.
Figure 1. The overall Workflow

6
These connectivity measures are arranged in a tensor and fed as features to a deep seizure
detection network. We evaluate and compare four different deep learning models by
manipulating the fusion mechanism. Since our features represent different perspective
connectivity measures, we intend to benefit from the rich information found in these features.
Similar to [37], we use the activation values of the learned model to output the type of
connectivity that quantifies participation percentage in making the model make a certain decision.
3.2. Data Preparation and Processing
The raw data were first pre-processed to extract 20-seconds long windows of seizure and non-
seizure data which were manually extracted using the labels provided with the data. Each of these
signals is divided further into ten 2-seconds long sub-windows resulting in a sequence of 10
seizure sub-windows that are fed to bandpass filters to extract five rhythms for each.
3.3. Feature Extraction
Taking our channels as a multivariate process , the multivariate linear shift-invariant filter
representation can be expressed by:
(1)
where is a vector of zero mean inputs and is matrix representing a filter
impulse response.
On the other hand, the multivariate autoregressive (MVAR) model of order can be expressed
as:
(2)
where can be considered as uncorrelated zero mean Gaussian noise. This model can allow
defining interactions between different signals such as coupling and causality using the matrices
since the term quantifies the causal linear interactions between and at lag k.
In the frequency domain, using the Fourier transform, the above equations yield
and . By comparing the two spectral
representations above, one can derive the following relation: .
The cross-spectral density matrix S(f) and its inverse P(f) are defined by
and where the superscript represents the
Hermitian transpose and represents the covariance of The coherence between the two
signals and at a frequency f is can now be derived as:
(3)

7
while the directed coherence can be expressed by:
(4)
where represents the variance of signal .
The partial coherence can also be derived in a similar fashion and can be represented by:
(5)
from which we can infer the partial directed coherence, PDC, defined by:
(6)
On the other hand, phase locking value is a measure that quantifies the synchrony between two
signals. The signals and are first band pass filtered in the specified frequency bands, and
then Hilbert transform is applied to extract the corresponding phases and . The phase
locking value (PLV) can be expressed as:
(7)
where N is the number of samples considered per window.
3.4. Deep Learning Model Architectures
The data formed is in the form of a tensor of dimension 7x10x19x19x5. Each data sample takes
in 10-time windows of the 7 connectivity features of 2-second intervals, each window is
represented by a 19x19x5 matrix where the third dimension represents the frequency band and
19x19 is the connectivity matrix. Each row and column represent an EEG channel so one entry in
this matrix represents the connectivity of the channel along the row with the channel along the
column in one of the 5 different frequency bands. In our four architectures, all convolution
operations use one filter because we are trying to capture a numerical combination of the input
matrix and not a characteristic (i.e. shade or edges) since it’s not an image. Furthermore, they
have a similar base architecture, but they use different schemes to combine the different features.
For our first architecture shown in figure 2, we separate the 7 features and the five frequency
bands of each feature, resulting into 35 independent inputs, and feed them into separate, identical
blocks as depicted in figure 2A. We process each input as a time series so each window passes
through one 2D convolution layer of kernel size of (19,1) . This operation helps us to condense
all the relations that a channel has with the other into one number. Since we obtained one feature
vector per window, we feed them to an LSTM block followed by a self-attention layer and fully

8
connected layers. Each one of the 35 inputs pass through this block and at the end, the vectors
obtained from the last FCNN layers are concatenated and fed to an FCNN layer for classification
as shown in figure 2B. This fusion scheme assumes total independence of the features and
frequency bands, the feature vectors are concatenated before the last FCNN layer. The second
fusion scheme, instead of combining the feature vectors in the last layer, we combine the vectors
obtained after the attention layer of each feature and feed them to FCNN layers as shown in
figure 3. For the third fusion scheme, after having a feature vector of size 19 for each window for
each feature we concatenate the feature vectors of similar time steps together, obtaining a 19x7
matrix then we proceed with a 2D convolution layer of kernel size (1,19) to get one feature vector
of size 19. We obtain channel-wise combination of the different features which we feed into an
LSTM block followed by an attention layer and FCNN layers (figure 4). Figure 5 depicts the
fourth fusion scheme, after having obtained a 19x19 matrix for each window of each feature we
concatenate the feature matrices of similar time steps together, obtaining a 19x19x7 matrix for
each time step then we proceed with a 3D convolution layer of kernel size (1,1,7) to get a 19x19
matrix in order to get a frequency wise combination of the different features and then proceed
with a 2D convolution layer of kernel size (1,19) followed
Figure 2. Model 1 Architecture

9
Figure 3. Model 2 Architecture
Figure 4. Model 3 architecture

10
Figure 5. Model 4 architecture
by an LSTM block, an Attention layer, and FCNN layers. Thus, our 4 fusion schemes aim at
reducing the level at which relationships between the features are the strongest. The first scheme
assumes no relation between features and frequencies, the second assumes a high-level relation,
the third assumes a channel-wise relation and the fourth one assumes a frequency wise relation.
3.5. Feature Relevance
For the XAI part, we planned to address one of the fundamental neuroscientific questions
concerned in finding relevant temporal and spatial scales necessary for given behaviour [28] by
doing a statistical study for the relationships between the derived connectivity measures varying
between spectral, causal, and phase-related, that characterize the brain signals on one hand and
the seizure-nonseizure for patient-specific and cross-patient cases on the other hand. We aim at
finding some link between the explained features contribution results to some of the scientific
facts concerning the seizure detection and neurology fields.
To apply XAI, we first targeted our research on the pre-processing and extracting different
connectivity measures (SM, ISM, DC, C, PDC, PC, PLV) across different frequency bands
) that we believe, based on some prior knowledge and experimental studies, have a
direct impact on seizure detection [13][25][26][27]. Then, we tried to achieve our explainability
in the post-modelling stage using the input-based explanation drivers methods where we base our
feature study on the output predictions [42]. As we see in the structure of model 2, we tried
separating each feature’s CNN-LSTM unit alone and concatenate them at a later stage, in such a
way we can simply do our study using the concatenation and the first dense layer only. This
allows us to keep track of the features.
The concatenation layer will combine the outputs from the 7 features separated CNN-LSTM units
into one flatten layer as depicted in figure 6 as vector . Our further investigation was based on
the feature extraction paper [37] at the first dense layer. To get the relevance of each of the input
neurons of the flatten layer, we first calculated the average absolute activation potential
contributed by the dimension of the input:
(8)

11
where the activation is: .
Then, we find the relative contribution of the input dimension towards the activation of
the hidden neuron:
(9)
In order to get the total net contribution is of an input dimension overall hidden layers, we then
computed the net of all ’s for every input overall :
(10)
Since, our input is a set of neurons per feature in order of the sequence SM, ISM, DC, C, GDPC,
PC, PLV respectively, then we further summed each of that sets alone to get the net per
feature. According to the feature extraction paper, the higher the contribution of an input
dimension, the more likely it is its participation in hidden neuronal activity and consequently,
classification [37].
Figure 6. Last layer architecture weights and activations from which relevance of features are computed
To begin our study, we first input the whole pre-processed dataset, cross-patient study, for seizure
and non-seizure cases separately in the first part of the model, the layers before the concatenation
part, and extracted the embeddings or the 400 neuron per feature and then got our results
according to the technique described above. The analysis is further extended to find inter-patient
feature relevance variations.

12
4. EXPERIMENTAL RESULTS
4.1. Experimental Setup
CHB-MIT is an EEG dataset collected at Children’s Hospital in Boston. It has 24 cases of
epilepsy. EEG was acquired using the international 10-20 system sampled at 256 samples per
second with a 16-bit resolution. Overall 198 seizures are annotated with the beginning and the
end of seizures. This particular dataset was chosen because it was used in many state of the art
papers which allows us to compare our deep learning model’s performance. As shown in figure 7,
we extract from this dataset 20 second intervals from seizure and non-seizure episodes. The
seizure of length less than 20 seconds, which were very few, were ignored in this study and the
seizure episodes of duration greater than 20 seconds were dissected into 20 second intervals. The
remainders were also considered by taking 20 seconds from the end. As for the non-seizure
episodes, four 20-second intervals were taken randomly from every dataset. The total number of
20 episode intervals collected contained 543 seizure intervals and 801 non-seizure intervals.
4.2. Validation Metrics
To evaluate the performance of the models, we use the statistical measures of binary
classification. We denote the seizure label as the positive class and the non-seizure case as the
negative class. We first define the following terms by
 True Positive (TP) : number of hits, correctly classified positives
 False Positive (FP): number false alarms, classified as seizure while it actually has no
seizure.
 False Negative (FN): number of misses, classified as non-seizure while it actually is
seizure.
 True Negative (TN): number of correct rejections, correctly classified negatives.
Sensitivity measures the percentage of positive class members that are correctly identified and is
given by:
(11)
Specificity gives the percentage of negative class members that are correctly identified whose
formula is:
(12)
Precision finds the positive predictive rate which explains how much of the positives were
identified.
(13)
Finally, the accuracy is computed using:
(14)

13
4.3. Feature Extraction
Each of these 20-second intervals was first filtered in the five frequency bands described above
and further dissected into ten 2-second intervals each of which is processed to extract the seven
described features. The ten sub-intervals are then gathered into a single tensor resulting in a
tensor of size 7x10x19x19x5 as shown in figure 7.
Figure 7. Data preparation and feature extraction
4.4. Performance Results of the proposed models
The models were evaluated on a data split of 85% training and 15% testing. After fine tuning, the
first model performed the best with a 97.03% accuracy as you can see in Table 2. The results also
show that the earlier the fusion scheme the weaker the relation between the features. Since our
data dissection method is totally new, no quantitative comparative analysis was performed. We
train all our models twice, the first time non-seizure data has a label of 1 and seizure data has a
label of 0 and the second time, using the resulting weights from the first time, the labels are
flipped, that procedure forces the model to learn more robust features for each class. The data
was too big to load into memory, so we had to write our own data generator that fetches batches
of 32 samples from the folder. The hyper parameters for each model are as follows:
 Model1: a relu activation function for both dense and convolution layers, 2 LSTM layers
and 1 dense layers of 100 neurons for each block and a dropout rate of 0.5. The optimizer
used is Adam and 30 epochs for both training phases.
 Model2: a relu activation function for both dense and convolution layers with a Spatial
Dropout2D of 0.07, we use 2 LSTM layers and all dense layers have Dropout of 0.5. We
use 2 dense layers of 263 and 20 neurons respectively. The optimizer used for the first
training phase was RMSprop for 17 epochs, after the flip Nadam was used as optimizer for
8 epochs.

14
and 3 dense layers of 500 neurons for each block and a dropout rate of 0.5 and l2
regularization. The optimizer used is Adam and 80 epochs for both training phases.
and 3 dense layers of 200 neurons for each block and a dropout rate of 0.3 and l2
regularization. The optimizer used is Adam and 100 epochs for both training phases.
The performance of the different modes on training and testing data are tabulated in table II. The
results are the average of ten runs with different splits each time. The overall accuracy,
sensitivity, specificity, and the precision are shown. While the model is well fit to the training
data in all models, unseen testing data performance is considered to choose the best model.
Model 1 performs the best in all measures recording a sensitivity of 97.65%, a specificity of
96.58%, precision 95.4%, and an overall accuracy of 97.03%.
Table 2. Performance Metrics Across all proposed models
Data Sensitivity Specificity Precision Accuracy
Model 1 Training 100.00 99.85 99.78 99.91
Model 1 Testing 97.65 96.58 95.40 97.03
Model 2 Training 99.13 98.10 97.22 98.51
Model 2 Testing 94.67 96.06 93.42 95.54
Model 3 Training 99.79 100.00 100.00 99.91
Model 3 Testing 80.23 92.24 88.46 87.13
Model 4 Training 96.08 96.49 94.84 96.32
Model 4 Testing 77.22 86.18 78.21 82.67
4.5. Feature Relevance Results
Figure 8 represents the overall feature relevance of all the data. We can see that spectral matrix
and partial coherence have higher relevance during seizures on average while non-seizure
decisions are made more effective with the coherence and the partial directed coherence.
Figure 8. Average feature relevance across all data

15
Cross-patient test results show the variation of the feature relevance diagrams for each of the tests
as can be seen in figures 9 and 10. We notice that every patient had different feature relevance
plot than the other, such that there is a different way in assigning the relevance and weights of
each feature which can be interpreted as a validation of the scientific fact that EEG patterns in
seizure patients are highly variable across patients. On the other hand, there were some changes
across the same patient results which can be explained on basing the features on both time and
frequency domains and EEG seizure pattern data are highly dynamic in nature for the same
patient.
Fig. 9. Feature relevance for patient 15
Figure. 10 Feature relevance for patient 20
Another interesting finding is that the directed coherence feature is often assigned with the least
net contribution The phase-locking value is often assigned with good relevance value in
seizure and non-seizure compared to other features which might be an indication for the fact that
PLV is one of the undirected connectivity measures that can infer the disconnection of the parts
of the brain where the seizure onset zones get disconnected [26].

16
5. DISCUSSION AND FUTURE WORK
EEG data analysis has been studied using EEG data and many of the techniques rely on the data
dissection and the features used. In this study, we adopted a sequential portion of EEG data
which contains information about a sequence of ten consecutive EEG sub-windows. This choice
was made because during seizures, the signals may alternate from seizure to non-seizure states.
Also, during non-seizure data, short bursts of seizure activity may arise and can be labelled as
seizure if taken separately. The BiLstm structure between these ten windows can learn
relationships between the windows during a seizure and avoid such misclassifications. Besides,
seizure data is generally noisy and stochastic in nature, and having relating higher-level
information between consecutive windows can help learn more complex relationships across
time.
Our study focuses on discriminating seizure/non-seizure while providing explanations of our
learned model which make it more presented different CNN-Lstm models for detecting seizure
based on long windows. Our models used different fusion strategies where the first two models
combine the features at the decision level, and the two combined them at the input level. We were
able to show that combining features at the decision layers yield in a much better performance
which can be an indication that the features are better learned when separated in the feature
extraction part. In our study, we made use of the latest techniques and advancement mainly in the
regularization and the normalization methods that helped our architecture in achieving better
results. We employed various fusion mechanisms and conclude that fusion at the input is not
performing well compared to fusion the features at the end which makes CNN able to learn and
extract its high order representation better.
Feature extraction for all the data windows considered in this study was computationally very
expensive and took few days. This can be accelerated using GPU programming and distributed
and parallel computing methodologies. Many extensions are possible at the level of architecture,
feature selection, and explainablity. We can investigate different architectures to capture other
relations, since the assumption that we started from in this study is that the smallest input possible
is a window of the same feature in different frequencies, we would have to investigate if the
smallest possible input window of the same frequency in different features yields better accuracy
showing that the seizure is more related at the frequency level rather than the feature level and we
will be able to do XAI to deduce which frequency band is the most important. Our methods need
to encompass other EEG datasets to find more general models and analyse other seizure patient
biomarkers. Finally, our methods can be extended to learn sequence relationships at transition
episodes where states shift from pre-ictal to ictal as well as ictal to post-ictal transitions to
characterize the state transitions during seizures.
Finally, we explained the relevance of each feature to characterize the dynamics during seizures.
In the future, these values can be extended to capture the relevant features in specific frequency
bands thus providing spectral information along with the type of connectivity between channels
and hence more explanation about the seizures.
6. CONCLUSION
Seizure detection using a sequence model was proposed in this study. We have shown that
relating higher resolution data together as a sequence can characterize differences between
seizure and non-seizure data. Among the four studied models based on deep learning, model 1
which used fusion at the level of the decision recorded 97.03% accuracy, 97.65% sensitivity,
96.58% specificity and a 95.40% precision. The models were based on different neural network

17
architectures mainly on CNN and LSTM with attention layers. The learned weights of the model
helped understand the relevance of the chosen features and further showed that they can represent
cross-patient discriminative features and open way to many future studies for seizure analysis.
REFERENCES
[1] World Health Organization. (2006). Neurological disorders: public health challenges. World Health
Organization.
[2] Truccolo, W., Ahmed, O. J., Harrison, M. T., Eskandar, E. N., Cosgrove, G. R., Madsen, J. R., ... &
Cash, S. S. (2014). Neuronal ensemble synchrony during human focal seizures. Journal of
Neuroscience, 34(30), 9927-9944.
[3] Alotaiby, T. N., Alshebeili, S. A., Alshawi, T., Ahmad, I., & El-Samie, F. E. A. (2014). EEG seizure
detection and prediction algorithms: a survey. EURASIP Journal on Advances in Signal
Processing, 2014(1), 183.
[4] Zaylaa, A. J., Harb, A., Khatib, F. I., Nahas, Z., & Karameh, F. N. (2015, September). Entropy
complexity analysis of electroencephalographic signals during pre-ictal, seizure and post-ictal brain
events. In 2015 International Conference on Advances in Biomedical Engineering (ICABME) (pp.
134-137). IEEE.
[5] Carney, P. R., Myers, S., & Geyer, J. D. (2011). Seizure prediction: methods. Epilepsy &
behavior, 22, S94-S101.
[6] Friston, K. J. (2011). Functional and effective connectivity: a review. Brain connectivity, 1(1), 13-36.
[7] Pereda, E., Quiroga, R. Q., & Bhattacharya, J. (2005). Nonlinear multivariate analysis of
neurophysiological signals. Progress in neurobiology, 77(1-2), 1-37.
[8] Bastos, A. M., & Schoffelen, J. M. (2016). A tutorial review of functional connectivity analysis
methods and their interpretational pitfalls. Frontiers in systems neuroscience, 9, 175.
[9] Kovach, C. K. (2017). A biased look at phase locking: Brief critical review and proposed
remedy. IEEE Transactions on signal processing, 65(17), 4468-4480.
[10] Van Mierlo, P., Papadopoulou, M., Carrette, E., Boon, P., Vandenberghe, S., Vonck, K., &
Marinazzo, D. (2014). Functional brain connectivity from EEG in epilepsy: Seizure prediction and
epileptogenic focus localization. Progress in neurobiology, 121, 19-35.
[11] Thorniley, J. (2011). An improved transfer entropy method for establishing causal effects in
synchronizing oscillators. In ECAL (pp. 797-804).
[12] Battaglia, D., Witt, A., Wolf, F., & Geisel, T. (2012). Dynamic effective connectivity of inter-areal
brain circuits. PLoS computational biology, 8(3).
[13] Brázdil, M., Halámek, J., Jurák, P., Daniel, P., Kuba, R., Chrastina, J. & Rektor, I. (2010). Interictal
high-frequency oscillations indicate seizure onset zone in patients with focal cortical
dysplasia. Epilepsy research, 90(1-2), 28-32.
[14] B.J. Edelman, N. Johnson, A. Sohrabpour, S. Tong, N. Thakor, B. He, Systems neuroengineering:
understanding and interacting with the brain, Engineering. 1 (2015) 292–308. doi:10.15302/j-
eng2015078.
[15] P. Van Mierlo, E. Carrette, H. Hallez, R. Raedt, A. Meurs, S. Vandenberghe, D. Van Roost, P. Boon,
S. Staelens, K. Vonck, Ictal-onset localization through connectivity analysis of intracranial EEG
signals in patients with refractory epilepsy, Epilepsia. 54 (2013) 1409–1418. doi:10.1111/epi.12206
[16] Bandarabadi, M., Teixeira, C. A., Rasekhi, J., & Dourado, A. (2015). Epileptic seizure prediction
using relative spectral power features. Clinical Neurophysiology, 126(2), 237-248.
[17] Usman, S. M., Usman, M., & Fong, S. (2017). Epileptic seizures prediction using machine learning
methods. Computational and mathematical methods in medicine, 2017.
[18] Wang, H. E., Bénar, C. G., Quilichini, P. P., Friston, K. J., Jirsa, V. K., & Bernard, C. (2014). A
systematic framework for functional connectivity measures. Frontiers in neuroscience, 8, 405.
[19] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of
parkinson disease tele-monitoring data through data mining techniques. International Journal of
Advanced Research in Computer Science and Software Engineering, 2(3).
[20] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[21] Tsiouris, Κ. Μ., Pezoulas, V. C., Zervakis, M., Konitsiotis, S., Koutsouris, D. D., & Fotiadis, D. I.
(2018). A Long Short-Term Memory deep learning network for the prediction of epileptic seizures
using EEG signals. Computers in biology and medicine, 99, 24-37.

18
[22] Cho, K. O., & Jang, H. J. (2020). Comparison of different input modalities and network structures for
deep learning-based seizure detection. Scientific Reports, 10(1), 1-11.
[23] Boonyakitanont, P., Lek-uthai, A., Chomtho, K., & Songsiri, J. (2019). A Comparison of Deep
Neural Networks for Seizure Detection in EEG Signals. bioRxiv, 702654.
[24] Gunning, D. (2017). Explainable artificial intelligence (xai). Defense Advanced Research Projects
Agency (DARPA), nd Web, 2.
[25] Weiss, S. A., Lemesiou, A., Connors, R., Banks, G. P., McKhann, G. M., Goodman, R. R., & Diehl,
B. (2015). Seizure localization using ictal phase-locked high gamma: a retrospective surgical
outcome study. Neurology, 84(23), 2320-2328.
[26] Myers, M. H., Padmanabha, A., Hossain, G., de Jongh Curry, A. L., & Blaha, C. D. (2016). Seizure
prediction and detection via phase and amplitude lock values. Frontiers in human neuroscience, 10,
80.
[27] Douw, L., van Dellen, E., de Groot, M., Heimans, J. J., Klein, M., Stam, C. J., & Reijneveld, J. C.
(2010). Epilepsy is related to theta band brain connectivity and network topology in brain tumor
patients. BMC neuroscience, 11(1), 103.
[28] Hossain, M. S., Amin, S. U., Alsulaiman, M., & Muhammad, G. (2019). Applying deep learning for
epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), 15(1s), 1-17.
[29] Zhang, X., Yao, L., Dong, M., Liu, Z., Zhang, Y., & Li, Y. (2019). Adversarial Representation
Learning for Robust Patient-Independent Epileptic Seizure Detection. arXiv preprint
arXiv:1909.10868
[30] Li, Q., Chen, Y., Wei, Y., Chen, S., Ma, L., He, Z., & Chen, Z. (2017). Functional Network
Connectivity Patterns between Idiopathic Generalized Epilepsy with Myoclonic and Absence
Seizures. Frontiers in computational neuroscience, 11, 38.
[31] Daoud, H., & Bayoumi, M. A. (2019). Efficient epileptic seizure prediction based on deep
learning. IEEE transactions on biomedical circuits and systems, 13(5), 804-813.
[32] Akbarian, B., & Erfanian, A. (2019). A framework for seizure detection using effective connectivity,
graph theory and deep modular neural networks. arXiv preprint arXiv:1909.03091
[33] Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
[34] Meka, A., Maximov, M., Zollhoefer, M., Chatterjee, A., Seidel, H. P., Richardt, C., & Theobalt, C.
(2018). Lime: Live intrinsic material estimation. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 6315-6324).
[35] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for
discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 2921-2929).
[36] Chattopadhyay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2017). Grad-CAM++:
Improved Visual Explanations for Deep Convolutional Networks. arXiv preprint arXiv:1710.11063.
[37] Roy, D., Murty, K. S. R., & Mohan, C. K. (2015, July). Feature selection using deep neural networks.
In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE.
[38] Ramani, R. G., Sivagami, G., & Jacob, S. G. (2012). Feature relevance analysis and classification of
parkinson disease tele-monitoring data through data mining techniques. International Journal of
Advanced Research in Computer Science and Software Engineering, 2(3).
[39] Phang, Chun-Ren & Numan, Fuad & Hussain, Hadri & Ting, Chee-Ming & Ombao, Hernando.
(2019). A Multi-Domain Connectome Convolutional Neural Network for Identifying Schizophrenia
from EEG Connectivity Patterns. IEEE Journal of Biomedical and Health Informatics. PP. 1-1.
10.1109/JBHI.2019.2941222.
[40] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). “Why should I trust you?" Explaining the
predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining (pp. 1135-1144).
[41] Abdelhameed, A. M., Daoud, H. G., & Bayoumi, M. (2018, June). Deep convolutional bidirectional
LSTM recurrent neural network for epileptic seizure detection. In 2018 16th IEEE International New
Circuits and Systems Conference (NEWCAS) (pp. 139-143). IEEE.
[42] Fellous, J. M., Sapiro, G., Rossi, A., Mayberg, H. S., & Ferrante, M. (2019). Explainable Artificial
Intelligence for Neuroscience: Behavioral Neurostimulation. Frontiers in Neuroscience, 13, 1346.

19
AUTHORS
Hmayag Partamian is a Ph.D. candidate in the Electrical and Computer Engineering
(ECE) department at the American University of Beirut (AUB). He graduated with a
B.E. in electrical and computer engineering from the Lebanese University in 2005 and
masters and a M.S. in computational science from AUB in 2014. His research
encompasses diverse signal analysis and machine learning algorithms such as seismic
data analysis, biomedical signal analysis (EEG and ECG), image processing, and
vibration analysis. His current research encompasses modeling of the brain during
seizures using EEG data and developing detection and prediction models using decomposition techniques
and machine learning algorithms.
Fouad Khnaisser a recent computer and communication engineering graduate from
the American University of Beirut. He first started as a research assistant in the AUB
Mind Lab in his third year where he focused on analysing and classifying speech for
different purposes like emotional speech classification and reproduction.
Mohamad Mansour is a recent computer and communication engineering graduate
from the American University of Beirut. He has been part of the Socially Competent
Robotic and Agent Technologies - research group at CYENS Centre of Excellence in
Cyprus in the Explainable AI field. His research interests are natural language
processing, computer vision, reinforcement learning, and robotics.
Reem A. Mahmoud is a Ph.D. candidate in the Electrical and Computer Engineering
(ECE) department at the American University of Beirut (AUB). She graduated with a
B.S. in Electrical Engineering with high distinction from Alfaisal University in Riyadh,
Saudi Arabia, and an M.E. from AUB in 2015 and 2017, respectively. Her main area of
research is theoretical machine learning with a focus on learning from limited time-
series data. Her interests also extend to knowledge transferability and personalization
in machine learning.
Hazem Hajj is an Associate Professor with the American University of Beirut (AUB).
He is a senior member of IEEE and ACM. Over the years, Hazem has established
leadership in the field of artificial intelligence (AI) building on a strong mix of
industry and academics experiences at Intel Corporation and AUB. He received his
PhD from the University of Wisconsin-Madison in 1996, and his Bachelor from AUB
with distinction. Over the years, Hazem has been the recipient of numerous academic
and industry awards. His research interests include Artificial Intelligence (AI),
Machine Learning and Energy-Aware Computing, with special interests in Natural Language Processing
and Context Aware Sensing. His research has produced over 100 publications in the AI field in addition to
multiple patents and awards. His research has been funded by local and international funding sources,
including funding from Intel Corporation and Qatar National Research Fund (QNRF).
Fadi N. Karameh is an Associate Professor in the Electrical and Computer
Engineering Department at the American University of Beirut (AUB) in Beirut,
Lebanon. Prof Karameh joined AUB in 2003 shortly after graduating from the
Laboratory of Information and Decision Systems at the Massachusetts Institute of
Technology (MIT) in Cambridge, USA. His research includes system-theoretic
approaches in identification, estimation and signal processing in electrical engineering,
with an emphasis on neurophysiological signals and systems. His interdisciplinary interests include
developing identification and estimation tools for understanding nonlinear dynamic large-scale interactions
in brain cortical networks from multichannel electrical activity recordings.

8421ijbes01

More Related Content

What's hot (20)

Similar to 8421ijbes01 (20)

Recently uploaded (20)

8421ijbes01