SlideShare a Scribd company logo
Parallel Evolutionary Algorithms for Feature
Selection in High Dimensional Datasets
Safa Adi #1
, Mohammed Aldasht ∗2
#
Computer and IT Department, Palestine Polytechnic University
Palestine
1
safa_adi@ppu.edu
∗
Computer Engineering Department, Palestine Polytechnic University
Palestine
2
mohammed@ppu.edu
Abstract—Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is time-
consuming and less accurate comparing with other PEA.
Index Terms: Evolutionary algorithms, parallel com-
puting, classification, feature selection, high dimensional
dataset.
I. INTRODUCTION
Nowadays many disciplines have to deal with high
dimensional datasets which involve a huge number of
features. So we need data preprocessing methods and data
reduction models in order to simplify input data.
There are two main types of data reduction models [1]. The
first is: instance selection and instance generation processes
are focused on the instance level. (i.e. select a representative
portion of data that can fulfill a data mining task as if the
whole data is used) [14]. The second is: feature selection
and feature extraction models which work at the level of
characteristics. These models attempt to reduce a dataset by
removing noisy, irrelevant, or redundant features. Feature
selection is a necessary preprocessing step in analyzing big
datasets. It often leads to smaller data that will make the
classifier training better and faster [3].
Feature selection is a problem with big datasets. In order to
make classification faster and more accurate, we need to select
the subset of features that are discriminative. Evolutionary
algorithms like Genetic algorithms, Swarm intelligence
optimization, Ant colony optimization, etc. These methods
can be effective for this problem, but they require a huge
amount of computation (long execution time), also memory
consumption. In order to overcome these weaknesses, parallel
computing can be used.
In this survey, we will review a set of papers about parallel
evolutionary algorithms that used for feature selection in large
datasets. Furthermore, we will compare the performance of
different algorithms and environment.
The rest of the paper is organized as follow: Section2
is Background about feature selection approaches and
parallel architecture in general. Section3 talk about parallel
evolutionary algorithms. Section 4 will discuss and review
many papers which talk about the feature selection problem
by using parallel computing. Section5 contains the summary
of the survey, the last section is the conclusion and future
work.
II. BACKGROUND
In general, there are three classes of feature selection:
filter-based, wrapper, and embedded. The filter approach
analyzes the features statistically and ignores the classifier
[18]. Most of filter-based methods perform two operations,
ranking and subset selection. In some cases, these two
operations are performed sequentially, first the ranking, then
the selection, in other cases only the selection is carried
out. These methods are effective in terms of execution time.
However, filter methods sometimes select redundant variables;
since they don’t consider the relationships between variables.
Therefore, they are mainly used as a pre-processing method.
In the wrapper model [15], the process of feature selection is
depending on the performance of a specific classifier. But its
disadvantages are time-consuming and over fitting. The last
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
181 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
method for feature selection is the embedded. In this method,
the feature selection process and the learning algorithm
(tuning the parameters) are combined to each other[6, 15].
The selection of optimal feature subset is an optimization
problem that proved to be NP-hard, complex, and time-
consuming problem [13]. Two major approaches are
traditionally used to tackle NP-hard problems, as seen in
Figure1: exact methods and metaheuristics. Exact methods
allow exact solution to be found, but this approach is
impractical since it is extremely time consuming for real
world problems. On the other hand, metaheuristics are used
for solving complex and real world problems. Because
metaheuristics provide suboptimal (sometimes optimal)
solution in reasonable time [2, 11, 13].
As seen in Figure1, Metaheuristics are divided into two
categories [13]:
• Trajectory-based (exploitation-oriented methods): the
well-known metaheuristics families based on the manip-
ulation of a single solution. Include Simulated Annealing
(SA), Tabu Search (TS), Iterated Local Search (ILS),
Variable Local Search (VNS), and Greedy Randomized
Adaptive Search Procedures (GRASP).
• Population-based (exploration-oriented methods): the
well-known metaheuristics families based on the
manipulation of a population of solutions. Include PSO,
ACO, SS, Evolutionary Algorithms (EAs), Differential
Evolution (DE), Evolutionary Strategies (ES), and
Estimation Distribution Algorithms (EDA).
Fig. 1. Approaches for handling NP-hard problems
Metaheuristics algorithms have proved to be suitable tools
for solving the feature selection accurately and efficiently
for large dimensions in big datasets [2]. The main problems
when dealing with big datasets are: The first is execution
time because the complexity of the metaheuristics methods
for feature selection is at least O(n2
D), where n is the
number of instances and D is the number of features. The
second is memory consumption since most methods for
feature selection need to store the whole dataset in memory.
Therefore, the researchers try to parallelize the sequential
metaheuristics to improve their efficiency for feature selection
on large datasets. There are many programming models
and paradigms, such as MapReduce (Hadoop, spark), MPI,
OpenMP, CUDA [1, 6, 13]. Parallel computing can be process
interaction (shared memory, message passing) or problem
decomposition (task or data parallelization) [6].
Parallel computing is a good solution for these problems
since many calculations are carried out simultaneously in
the task and/or data [6]. Population-based metaheuristics are
naturally prone to parallelize since most of their variation
operators can be easily undertaken in parallel [2, 13].
Parallel implementations of metaheuristics are an effective
alternative to speed up sequential metaheuristics; by reducing
the search time for solutions of optimization problems.
Furthermore, they lead to the more precise random algorithm
and improve the quality of solutions [11]. As seen in Figure2,
the implementation of parallel metaheuristics is divided into
two categories [13].
Fig. 2. Parallel implementation of metaheuristics
Parallel evolutionary algorithms are used in many works
rather than feature selection, such as inferring phylogenies,
traffic prediction. In [9] Santander et al., used MPI/OpenMP
with a hybrid multiobjective evolutionary algorithm (fast non-
dominated sorting genetic algorithms and firefly algorithm);
for phylogenetic reconstruction (Inferring evolutionary trees).
In [10] Jiri at al., used parallel multiobjective GA with
OpenMP. In order to make traffic prediction more accurate.
Master-Slave scheme of GA was implemented on multi-core
parallel architecture. They reduced the computational time,
but it was successful for short-term traffic prediction.
III. OVERVIEW OF PARALLEL EVOLUTIONARY
ALGORITHMS FOR FEATURE SELECTION
Feature selection algorithms are used to find an optimal
subset of relevant features in the data. In this section we will
talk about parallel evolutionary algorithms that are used for
feature selection problem in large datasets. We will illustrate
the steps of six algorithms (PGA, PCHC, PPSO, PGPSO,
PSS, and PACO).
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
182 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
A. Parallel Genetic algorithm (PGA)
In order to increase the efficiency and reduce the execution
time of the genetic algorithm (GA); the researchers used par-
allel GA. Algorithm 1 presents the parallel GA methodology,
with the master-slave model of parallel GA.
Algorithm 1 Parallel genetic algorithm [10]
Create initial population
Evaluate initial population
Create slaves
while not done do
Start slave
Wait for slave to finish
Run mutation operator
end while
for i=1 to slave iterations do
Select individuals
Run crossover operator
Evaluate offsprings
if solution found then
set done=True
end if
end for
B. Parallel CHC algorithm (PCHC)
A CHC is a non-traditional GA, which combines a con-
servative selection strategy (that always preserves the best
individuals found so far), that produces offsprings that are at
the maximum hamming distance from their parent. The main
processes of CHC algorithm are [1]:
• Half-Uniform Crossover (HUX): This will produce two
offsprings, which are maximally different from their two
parents.
• Elitist selection: this will keep the best solutions in each
generation.
• Incest prevention: this step prevents two individuals to
mate if the similarity between them greater than a thresh-
old.
• The Restarting process: if the specified population
stagnated, then this step generated a new population by
choosing the best individuals.
C. Particle Swarm Optimization (PSO)
This subsection handles the geometric particle swarm
optimization (GPSO) and shows the algorithm that used to
parallelize PSO or GPSO.
1) Geometric Particle Swarm Optimization (GPSO):
GPSO is a recent version of PSO. The key issue in GPSO
is the using a multi-parental recombination of solutions
(particles). In the first phase, a random initialization of
particles created. Then the algorithm evaluates these particles
to update the historical and social positions. Finally, the
three parents (3PMBCX) move the particles, as shown in
Algorithm 2:
Algorithm 2 GPSO algorithm [2]
S:SwarmInitialization()
while not stop condition do do
for each particle i of the swarm S do do
evaluate(solution(xi))
update(velocity equation (hi))
update(global best solution (gi))
end for
for each particle i of the swarm S do do
xi:3PMBCX ((xi, wa), (gi, wb), (hi, wc))
mutate(xi)
end for
end while
Output: best solution found
2) Parallel Multi Swarm Optimization (PMSO): Parallel
multi swarm optimization presented in [2], it was defined in
analogy with parallel GA as a pair of (S, M), where S is a
collection swarm, and M is a migration policy. Algorithm 3
depicts the parallel PSO methodology.
Algorithm 3 Multi swarm optimization [2]
DO IN PARALLEL for each i 1,...,m
initialize(Si)
while not stop condition do do
iterate Si for n steps /* PSO evolution */
for each Sj (Si) do do
send particles in s(Si) to Sj
end for
for each Sj such that Si (Sj ) do do
receive particles from Sj
replace particles in Si according to r
end for
end while
Output: best solution ever found in the multi-swarm
D. Parallel Scatter Search (PSS)
Scatter search is an evolutionary method that was success-
fully applied to hard optimization problems. It uses strategies
for search diversification and intensification that have proved
effective in a variety of optimization problems, see Algorithm
4.
E. Parallel Ant Colony Optimization (PACO)
When dealing with huge search space, parallel computing
techniques usually applied to improve the efficiency. Parallel
ACO algorithms can achieve high-quality solutions in
reasonable execution times comparing with sequential ACO
[18]. In Algorithm 5, the methodology of PACO is presented.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
183 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
Algorithm 4 Parallel scatter search methodology [11]
Create Population (Pop, PopSize)
Generate ReferenceSet (RefSet, RefSetSize)
while Stopping Criterion1 do
while Stopping Criterion2 do
Select Subset (Subset, SubsetSize)
for each processor r=1 to n do in parallel do
Combine Solutions (SubSet, CurSol)
Improve Solution (CurSol, ImpSol)
end for
end while
Update ReferenceSet (RefSet)
end while
Algorithm 5 Parallel ant colony optimization methodology
[18]
Generate Ants
Initialize N processors
Multicast to all slaves processors N and the task ids of all
slaves
for each slave do do
Send a number between 0 and N that identifies the task
inside the program
end for
while not all slaves have sent back solution do
Wait for solution
if a slave returns a solution that is better than any solution
received then
Multicast this solution to all slaves
end if
end while
Return the best solution
IV. PARALLEL EVOLUTIONARY ALGORITHMS FOR
FEATURE SELECTION
We reviewed a set of research papers, which were dealing
with feature selection problem for high dimensional datasets
in a parallel environment and using parallel evolutionary
algorithms. Let us discuss these studies in the following
subsections.
A. Parallel GA
Liu et al., [5] used parallel GA for selecting informative
genes (features) in tissue classification, using wrapper
approach. The main purpose was to find the subset of features
with fewer elements and higher accuracy. The parallelization
of GA performed by dividing the population into sub-
populations, and then the GA run on each sub-population.
Therefore, the searching for the optimal subset of genes can
be on several CPUs/computers at the same time.
For evaluation, the Golub classifier was used. This classifier
introduced by the authors and it depend on the sign of the
results for classification; if the sign is positive the sample x
belongs to class 1, else if it negative the sample x belongs
to class 2. This classifier used only if the datasets have two
classes. The accuracy of the classifier tested by using the
LOOCV (leave one out cross validation) method. The results
showed that using the parallel GA increased the accuracy,
and reduced the number of genes that used for classification.
In [8] Zheng et al., analyzed the execution speed and
solution quality of many parallel GA schemes theoretically.
Furthermore, they pointed to the best scheme of parallel GA
that used on multi-core architecture. This paper considered
the relationship between speed and parallel architecture along
with solution quality.
They analyzed (Master-Slave, Synchronous Island,
Asynchronous Island, Cellular, and hybrid scheme of Master-
Slave and Island) schemes of parallel GA, with Pthread
library on multi-core parallel architecture.
To validate their theoretical analyzing an experiments
performed. The hybrid scheme of (Master-Slave and
Asynchronous Island) was the best scheme in performance
using multi-core architecture. The Island scheme has the best
execution time, but the worst solution quality. To improve
the solution quality when using Island model it is better to
decrease the number of islands. The Asynchronous Island is
faster than the Synchronous. The Master-Slave scheme has
the best solution quality and the worst execution time.
Soufan at el., [15] developed a web-based tool called
DWFS, which used for feature selection for different
problems. This tool followed a hybrid approach of wrapper
and filter. First, the filter used as preprocessing and select
the top ranked features based on tunable and a predefined
threshold. In the next step, parallel GA based on wrapper
approach applied to the selected features to search for subset
features that increase the classifier accuracy. The scheme of
parallel GA was Master-Slave; the master node used to create
initial population and GA steps. While the slave (worker)
nodes used for fitness evaluation of each chromosome, this
implementation is performed on 64 core.
For evaluation, they used three different classifiers
(Bayesian classifier, K-nearest neighbor, and a combination
of them). The results of the experiments show that DWFS
tool provided many options to enhance the feature selection
problem in different biological and biomedical problems.
In [7] Pinho et al., presented a framework called ParJEColi
(java-based library) for a parallel evolutionary algorithm in
bioinformatics applications. The aim of this platform was
to make the parallel environment (multi-core, cluster, and
grid) easy and transparent to the users. This library adapted
itself to the problem and the target parallel architecture. The
user can easily configure the parallel model and the target
architecture; since, ParJEColi encapsulated the parallelization
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
184 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
concerns as features. The explicit steps implemented by a
simple GUI.
The experiments for validation this framework were done
on 2 biological dataset and many bioinformatics scenarios.
The results indicate that the proposed framework improves
the computational performance (decreases execution time)
also the solution quality.
B. Parallel CHC
In [1] Peralta et al., presented a parallel evolutionary
algorithm called CHC algorithm by using the MapReduce
paradigm for selecting features in high dimensional datasets
to improve the classification. The parallelization of CHC
algorithm is done by using MapReduce procedure (Hadoop
implementation).
A cluster of computers of 20 computing nodes were used.
Each dataset split into 512-map task. For evaluating their
work, three classifiers where used SVM (support vector
machine), logistic regression, and Bayesian classifier.
The results showed that the run time for classification
increased as the number of features decreased, except for
Bayesian classifier. They explained this result as follow: if
the number of blocks less than the number of computing
machines; this leads to have some machines remain idle. In
addition, if the number of blocks greater than the number of
computing machines, the blocks maybe will not distributed
in efficient way.
They compared parallel CHC with the serial version, and
they concluded that the accuracy of classification increased by
using parallel CHC. Furthermore, the parallel version of CHC
reduced the run time when the datasets is high dimensional.
C. Parallel PSO
PSO is an efficient optimization technique, it used to
solve the problem of feature selection in high dimensional
datasets. In [4] Chen et al., used the parallel PSO algorithm
for solving two problems at the same time. By creating an
objective function that takes into account three variables
at the same time (the selected features, the number of
support vectors, and average accuracy of SVM). In order
to maximize the capability of SVM classifier in generalization.
The proposed method called PTVPSO-SVM (parallel time
variant particle swarm optimization support vector machine),
it had two phase: 1) the parameter settings of SVM and
feature selection work together. 2) the accuracy of SVM
evaluated using the set of features and the optimal parameters
from the first phase.
They used parallel virtual machine (PVM) with 8 machines;
and 10-fold cross validation. The results showed that they
could achieve the following aims: increasing the accuracy
classification, reducing the execution time comparing
with sequential PSO, producing an appropriate model of
parameters, and selecting the most discriminative subset of
features.
Feature selection can be carried out based on rough set
theory with searching algorithm as in [3, 6]. In [6] Qian
et al., proposed three parallel attribute reduction (feature
selection) algorithms based on MapReduce on Hadoop. The
first algorithm was built by constructing the proper (key,
value) by rough set theory and implementing MapReduce
functions. The second algorithms were done by realizing
the parallel computation of equivalence classes and attribute
significances. The last parallel algorithm was designed to
acquire the core attributes and a reduce in both data and
parallel task.
The experiments are performed on a cluster of computers
(17 computing node). They considered the performance
of the parallel algorithms, but they did not focus on the
classification accuracy; since the sequential and parallel
algorithms gave the same results. The results showed that the
proposed parallel attribute reduction algorithms could deal
with high dimensional datasets in an efficient way and better
than the sequential algorithms.
In [3] Adamczyk, use rough set theory for attribute
reduction, to increase the efficiency he implemented parallel
Asynchronous PSO for this problem. The parallelization was
done by assigning the complex function computations in
slave cores and the main core make the updating particle and
checking the convergence of the algorithm.
From their experiments it was noticeable that the efficiency
and speedup of parallel PSO algorithm were raising as the
size of dataset increased. The achievable accuracy was not
astonishing, but it was better than the classical algorithms.
D. Parallel GPSO
In [2] Garcia-Nieto et al., parallelized a version of PSO
called GPSO which is suitable for feature selection problem
in high dimensional datasets. The proposed method was called
PMOS (Parallel multi-swarm optimizer). Which was done by
running a set of parallel sub PSOs algorithms, which forming
an island model. Migration operation exchanged solutions
between islands based on a certain frequency The aim of
the fitness function increasing the classification accuracy and
reduce the number of selected genes (features).
They used the SVM classifier (Support Vector Machine)
to prove the accuracy of the selected subset of features. In
their experiments, they used a cluster of computers as a
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
185 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
parallel architecture. They found that 8-swarm PMSO was
the best choice for parallelization. The results pointed out
that this algorithm was better than the sequential version and
other methods in term of performance and accuracy while it
selected few genes for each subset.
E. Parallel SS
In [11] Lopez et al., present a parallel SS metaheuristics
for solving feature selection problem in classification. They
proposed two methods for combining solutions in SS.
The first method is called GC (greedy combination): in this
strategy, the common features of the combined solutions are
added, then at each iteration one of the remaining features is
added to any new solution.
The second strategy is called RGC (reduced greedy
combination), it has the same start as GC, but in the next
step, it considers only the features that appear in solutions
with good quality. Then the parallelization of SS is obtained
by running these two methods (GC, RGC) at the same time
on two processors. Using different combination methods and
parameters settings at each processor.
They compared the proposed parallel SS with sequential
SS and GA. The results show that the quality of solution in
parallel SS is better than solutions which was obtained from
the sequential SS and GA. Also, the parallel SS use a smaller
set of features for classification. The run time is the same for
parallel and sequential SS.
F. Parallel ACO
This subsection shows how the parallel ACO is used to
solve feature selection problem for classification in high
dimensional datasets.
In [17] Meena et al., implemented a parallel ACO to
solve the feature selection problem for long documents. The
parallelization was done using MapReduce programming
model (Hadoop) that automatically parallelize the code and
data then run them on a cluster of computing nodes. The
wrapper approachis used as evaluation criteria that used
Bayesian classifier. Furthermore, the accuracy of the classifier
was based on these metrics: precision, recall, accuracy and
F-measure.
The enhanced algorithm (parallel ACO) was compared with
ACO, enhanced ACO, and two feature selection methods,
CHI (Statistical technique) and IG (Information Gain). They
used Bayesian classifier in evaluation process. The results
showed that for a given fixed quality of the solutions the
proposed algorithm could reduce the execution time but
without considered the solution quality. On the other hand,
the accuracy of the classifier was increased using parallel
TABLE I
SUMMARY OF ALGORITHMS AND PROGRAMMING MODELS
Paper
Used
evolutionary
algorithm
Parallel
Programming
model
Peralta et al. [1]
CHC
(Type of GA)
MapReduce
Garcia-Nieto et al. [2] GPSO MALLBA
Adamczyk [3] PSO Unknown
Chen et al. [4] PSO PVM
Liu et al. [5] GA Unknown
Lopez et al. [11] SS Unknown
Soufan et al. [15] GA MPI
Meena et al. [17] ACO MapReduce
ACO comparing with sequential ACO and feature selection
methods.
In [12] Cano et al., parallelized an existing multi-objective
ant programming model that used as the classifier. This
algorithm was used for rule mining in high dimensional
datasets. The parallelization was done on data and each ant
encoded a rule. This was achieved by let each processor
perform the same task on a different subset of the data at the
same time. In the implementation, they used GPUs, which
are multi-core and parallel processor units architecture. This
parallel model Followed CUDA method.
For evaluation they used these metrics: true positive,
false positive, true negative, false negative, sensitivity, and
specificity. The results indicate that the efficiency of this
model was increased as the size of datasets increased.
V. SUMMARY AND DISCUSSION
The summary of the papers that implemented the parallel
EA for solving the classification problem in high dimensional
datasets is reported in Table 1 and Table 2.
Many research papers [2, 3, 7, 8, 9, 10, 12], stated that
we can reduce the execution time and achieve acceptable
speed ups, when applying parallel evolutionary algorithms
on multiple processors. We noticed that they achieved a
reasonable speed up in many cases.
In the next table (Table 2), when comparing the accuracy
of parallel EA it is important to notice how many classifiers
were used to measure the accuracy. Furthermore, we should
consider the metrics that were used to evaluate the classifier.
For example, the parallel PSO and its variants have the
higher accuracy; but they used only one metric which is the
success rate. This means that the parallel PSO is not the most
accurate parallel EA based on Table 2.
On the other hand, the parallel GA and its variant has the
least accuracy, but they used from two to five metrics for
evaluation purpose. Based on these metrics, we can say that
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
186 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE II
SUMMARY OF DATASETS, CLASSIFIERS, AND ACCURACY RESULTS
Paper dataset Classifiers
Metrics for
classification
Accuracy
Peralta
et al. [1]
Epsilon Bayesian
AUC=
(TPR+TNR)/2
0.71
SVM 0.68
Logistic
Regression
0.70
ECBDL
14-ROS
Bayesian 0.67
SVM 0.63
Logistic
Regression
0.63
Garcia-
Nieto
et al. [2]
Colon SVM
Success
Rate
0.85
Lymp 0.97
Leuk 0.98
Lung 0.97
Adamczyk
[3]
15 Data
Set
—
Success
rate
0.70
(Avg)
Chen
et al. [4]
30 Data
Set
SVM
Success
rate
0.87
(Avg)
Liu
et al. [5]
Leukemia Golub
Success
rate
0.88
Colon N/A
Lopez
et al. [11]
12 Data
Set
Nearest
Neighbor
Success
rate
0.86
(Avg)
Bayesian
0.87
(Avg)
Decision
Tree
0.86
(Avg)
Soufan
et al. [15]
9 Data
Set
K- Nearest
Neighbor
F1, PPV,
GMean,...
0.81(Avg)
(GMean)
Bayesian
0.79(Avg)
(GMean)
Meena
et al. [17]
2 Data
Sets
Bayesian
F-measure,
recall,....
0.64
(Avg)
the parallel GA is the best parallel EA for feature selection
in high dimensional datasets
VI. CONCLUSION
After the review of different parallel EA that are used
to solve the feature selection problem in high dimensional
datasets. We adopted the accuracy as a measure to compare
the algorithms performance.
The following points show our conclusion about the perfor-
mance of the mentioned algorithms in this chapter for feature
selection:
• GA and its variants: based on the papers we reviewed,
the parallel GA has the higher accuracy.
• PSO and its variants: the parallel PSO has the same
accuracy as sequential PSO.
• SS: parallel SS gives better results in case of accuracy
than GA and sequential SS.
• ACO: parallel ACO has the less accurate results than the
other parallel EA.
It is noticeable that PGAs are the most suitable algorithms
for feature selection in large datasets; since they achieved
the highest accuracy. On the other hand, the PACO is
time-consuming and less accurate comparing with other PEA.
References
[1] D. Peralta, S. RA.O, S. Rama.rez-Gallego, I.Triguero,
J. Benitez, F. Herrera. ”Evolutionary Feature Selection
for Big Data Classification: A MapReduce Approach”.
Hindawi Publishing Corporation, Mathematical Problems in
Engineering, Volume 11, Article ID 246139, (2015).
[2] J. Garca-Nieto, E. Alba. ”Parallel multi-swarm optimizer
for gene selection in DNA microarrays”. DOI 10.1007/s10489-
011-0325-9, Appl Intell (2012) 37:255266.
[3] M. Adamczyk. ”Parallel Feature Selection Algorithm
based on Rough Sets and Particle Swarm Optimization”.
DOI:.10.15439/2014F389, ACSIS, VOL.2. IEEE, (2014).
[4] H. Ling Chen, B. Yang, S. Wang, G. Wang, D. Liu,
H. Zhong Li, W. Liu. ”Towards an optimal support vector
machine classifier using a parallel particle swarm optimization
startegy”. Applied Mathematics and Computation 239, 180-
197. Elsevier, (2014).
[5] J. Liu, H. Iba. ”Selecting Informative Genes with Parallel
Genetic Algorithms in Tissue Classification”. Genome
Informatics 12: 14-23, (2001).
[6] J. Qian, D. Miao, Z. Zhang, X. Yue. ”Parallel attribute
reduction algorithms using MapReduce”. Elsevier, Information
Sciences (2014).
[7] J. Pinho, L. Sobral, M. Rocha. ”Parallel evolutionary
computation in bioinformatics applications”. Elsevier.
Computer methods and programs in biomedicine 110,
183191, (2013).
[8] L. Zheng, Y. Lu, M. Ding, Y.Shen, M. Guo, S. Guo.
”Architecture-based performance evaluation of genetic
algorithms on multi/many core systems”. The 14th IEEE
International Conference on Computational Science and
Engineering. 978-0-7695-4477-9/11, (2011) IEEE DOI
10.1109/CSE.2011.65
[9] S. Santander-Jimenez, M. Vega-Rodrguez. ”Parallel
Multiobjective Metaheuristics for Inferring Phylogenies on
Multicore Clusters”. DOI 10.1109/TPDS.2014.2325828, IEEE
(2014).
[10] J. Petrlik, L. Sekanina. ”Towards Robust and Accurate
Traffic Prediction Using Parallel Multiobjective Genetic
Algorithms and Support Vector Regression”. (2015) IEEE
18th International Conference on Intelligent Transportation
Systems.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
187 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
[11] F. Lopez, M. Torres, B. Batista, J. Perez, J. Vega.
”Solving feature selection problem by a parallel Scatter
Search”. Elsevier, European Journal of Operational Research
169, 477-489, (2006).
[12] A. Cano, J. Olmo, S. Ventura. ”Parallel Multi-Objective
Ant Programming for Classification Using GPUs”. Journal of
Parallel and Distributed Computing, November (2012).
[13] E. Albaa, G. Luquea, S. Nesmachnowb. ”Parallel
metaheuristics: recent advances and new trends” Intl.
Trans. in Op. Res. 20 (2013) 148, DOI: 10.1111/j.1475-
3995.2012.00862.x
[14] H. Liu, M. Hiroshi. ”Instance Selection and Construction
for Data mining”. Springer ,Eds (2001).
[15] O. Soufan, D. Kleftogiannis, P. Kalnis, V. Bajic. ”DWFS:
A wrapper Feature Selection Tool Based on a Parallel Genetic
Algorithm”. PLOS ONE, DOI: 10.1371/journal.pone.0117988,
February (2015).
[16] A. Gottlieb, G.S. Almasi. ”Highly Parallel Computing”.
Benjamin-Cummings Publishing Co., Inc. Red word city, CA,
USA (1989).
[17] M. Meena, K.R. Chandran, A. Kathik, A. Samuel. ”A
parallel ACO algorithm to select terms to categorise longer
documents”. Int. J. Computational Science and Engineering,
Vol 6 , No.4, (2001).
[18] A. Sameh, A. Ayman, N. Hasan. ”Parallel Ant Colony
Optimization”. International Journal of research and reviews
in Computer Science (IJRRCS), Vol.1, No. 2, June (2010).
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
188 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
Ad

Recommended

PDF
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
PDF
Extended pso algorithm for improvement problems k means clustering algorithm
IJMIT JOURNAL
 
DOC
Predictive job scheduling in a connection limited system using parallel genet...
Mumbai Academisc
 
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
IJDKP
 
PDF
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 
PDF
Using particle swarm optimization to solve test functions problems
riyaniaes
 
PDF
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...
csandit
 
PDF
IJCSI-2015-12-2-10138 (1) (2)
Dr Muhannad Al-Hasan
 
PDF
An enhanced pairwise search approach for generating
Alexander Decker
 
PDF
A Genetic Algorithm on Optimization Test Functions
IJMERJOURNAL
 
PDF
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
IJDKP
 
PDF
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
ijitcs
 
PDF
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
acijjournal
 
PDF
GCUBE INDEXING
IJDKP
 
PDF
An Automatic Clustering Technique for Optimal Clusters
IJCSEA Journal
 
PDF
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
CSCJournals
 
PDF
Applying genetic algorithms to information retrieval using vector space model
IJCSEA Journal
 
PDF
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
aciijournal
 
PDF
Feature selection using modified particle swarm optimisation for face recogni...
eSAT Journals
 
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS A fast clustering based feature subset se...
IEEEFINALYEARSTUDENTPROJECTS
 
PDF
A046010107
IJERA Editor
 
PDF
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
IJCSEA Journal
 
PDF
The pertinent single-attribute-based classifier for small datasets classific...
IJECEIAES
 
PDF
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
Hengzhe Zhang
 
PDF
IRJET- Survey of Feature Selection based on Ant Colony
IRJET Journal
 
PPSX
9th ITAB 2009 Parallel-MEGA
Christos Kannas
 
PDF
Parallel and distributed genetic algorithm with multiple objectives to impro...
khalil IBRAHIM
 
PDF
Enhancing feature selection with a novel hybrid approach incorporating geneti...
IJECEIAES
 
PDF
Optimal Feature Selection from VMware ESXi 5.1
ijccmsjournal
 

More Related Content

What's hot (16)

PDF
An enhanced pairwise search approach for generating
Alexander Decker
 
PDF
A Genetic Algorithm on Optimization Test Functions
IJMERJOURNAL
 
PDF
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
IJDKP
 
PDF
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
ijitcs
 
PDF
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
acijjournal
 
PDF
GCUBE INDEXING
IJDKP
 
PDF
An Automatic Clustering Technique for Optimal Clusters
IJCSEA Journal
 
PDF
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
CSCJournals
 
PDF
Applying genetic algorithms to information retrieval using vector space model
IJCSEA Journal
 
PDF
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
aciijournal
 
PDF
Feature selection using modified particle swarm optimisation for face recogni...
eSAT Journals
 
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS A fast clustering based feature subset se...
IEEEFINALYEARSTUDENTPROJECTS
 
PDF
A046010107
IJERA Editor
 
PDF
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
IJCSEA Journal
 
PDF
The pertinent single-attribute-based classifier for small datasets classific...
IJECEIAES
 
An enhanced pairwise search approach for generating
Alexander Decker
 
A Genetic Algorithm on Optimization Test Functions
IJMERJOURNAL
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
IJDKP
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
A h k clustering algorithm for high dimensional data using ensemble learning
ijitcs
 
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATA
acijjournal
 
GCUBE INDEXING
IJDKP
 
An Automatic Clustering Technique for Optimal Clusters
IJCSEA Journal
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
CSCJournals
 
Applying genetic algorithms to information retrieval using vector space model
IJCSEA Journal
 
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...
aciijournal
 
Feature selection using modified particle swarm optimisation for face recogni...
eSAT Journals
 
IEEE 2014 JAVA DATA MINING PROJECTS A fast clustering based feature subset se...
IEEEFINALYEARSTUDENTPROJECTS
 
A046010107
IJERA Editor
 
Applying Genetic Algorithms to Information Retrieval Using Vector Space Model
IJCSEA Journal
 
The pertinent single-attribute-based classifier for small datasets classific...
IJECEIAES
 

Similar to Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets (20)

PDF
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
Hengzhe Zhang
 
PDF
IRJET- Survey of Feature Selection based on Ant Colony
IRJET Journal
 
PPSX
9th ITAB 2009 Parallel-MEGA
Christos Kannas
 
PDF
Parallel and distributed genetic algorithm with multiple objectives to impro...
khalil IBRAHIM
 
PDF
Enhancing feature selection with a novel hybrid approach incorporating geneti...
IJECEIAES
 
PDF
Optimal Feature Selection from VMware ESXi 5.1
ijccmsjournal
 
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
PDF
Integrated bio-search approaches with multi-objective algorithms for optimiza...
TELKOMNIKA JOURNAL
 
PDF
Study on Relavance Feature Selection Methods
IRJET Journal
 
PDF
777777777777777777777777777777777777777.pdf
AsimRaza417630
 
PDF
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
lemanamiddag
 
PDF
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
doweyhostel
 
PDF
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
ijcsa
 
PDF
Hakimi asiabar, m. 2009: multi-objective genetic local search algorithm using...
ArchiLab 7
 
PDF
Parallel Genetic Algorithms in the Cloud
Pasquale Salza
 
PDF
The International Journal of Engineering and Science (The IJES)
theijes
 
PDF
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
rasabigley
 
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
Hengzhe Zhang
 
IRJET- Survey of Feature Selection based on Ant Colony
IRJET Journal
 
9th ITAB 2009 Parallel-MEGA
Christos Kannas
 
Parallel and distributed genetic algorithm with multiple objectives to impro...
khalil IBRAHIM
 
Enhancing feature selection with a novel hybrid approach incorporating geneti...
IJECEIAES
 
Optimal Feature Selection from VMware ESXi 5.1
ijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Integrated bio-search approaches with multi-objective algorithms for optimiza...
TELKOMNIKA JOURNAL
 
Study on Relavance Feature Selection Methods
IRJET Journal
 
777777777777777777777777777777777777777.pdf
AsimRaza417630
 
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
lemanamiddag
 
Computational Methods of Feature Selection 1st Edition Huan Liu (Editor)
doweyhostel
 
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
ijcsa
 
Hakimi asiabar, m. 2009: multi-objective genetic local search algorithm using...
ArchiLab 7
 
Parallel Genetic Algorithms in the Cloud
Pasquale Salza
 
The International Journal of Engineering and Science (The IJES)
theijes
 
Computational Methods Of Feature Selection Huan Liu Hiroshi Motoda
rasabigley
 
Ad

Recently uploaded (20)

DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PPTX
Simplifica la seguridad en la nube y la detección de amenazas con FortiCNAPP
Cristian Garcia G.
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PPTX
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Simplifica la seguridad en la nube y la detección de amenazas con FortiCNAPP
Cristian Garcia G.
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Practical Applications of AI in Local Government
OnBoard
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
Ad

Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets

  • 1. Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets Safa Adi #1 , Mohammed Aldasht ∗2 # Computer and IT Department, Palestine Polytechnic University Palestine 1 [email protected] ∗ Computer Engineering Department, Palestine Polytechnic University Palestine 2 [email protected] Abstract—Feature selection in high-dimensional datasets is considered to be a complex and time-consuming problem. To enhance the accuracy of classification and reduce the execution time, Parallel Evolutionary Algorithms (PEAs) can be used. In this paper, we make a review for the most recent works which handle the use of PEAs for feature selection in large datasets. We have classified the algorithms in these papers into four main classes (Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Scattered Search (SS), and Ant Colony Optimization (ACO)). The accuracy is adopted as a measure to compare the efficiency of these PEAs. It is noticeable that the Parallel Genetic Algorithms (PGAs) are the most suitable algorithms for feature selection in large datasets; since they achieve the highest accuracy. On the other hand, we found that the Parallel ACO is time- consuming and less accurate comparing with other PEA. Index Terms: Evolutionary algorithms, parallel com- puting, classification, feature selection, high dimensional dataset. I. INTRODUCTION Nowadays many disciplines have to deal with high dimensional datasets which involve a huge number of features. So we need data preprocessing methods and data reduction models in order to simplify input data. There are two main types of data reduction models [1]. The first is: instance selection and instance generation processes are focused on the instance level. (i.e. select a representative portion of data that can fulfill a data mining task as if the whole data is used) [14]. The second is: feature selection and feature extraction models which work at the level of characteristics. These models attempt to reduce a dataset by removing noisy, irrelevant, or redundant features. Feature selection is a necessary preprocessing step in analyzing big datasets. It often leads to smaller data that will make the classifier training better and faster [3]. Feature selection is a problem with big datasets. In order to make classification faster and more accurate, we need to select the subset of features that are discriminative. Evolutionary algorithms like Genetic algorithms, Swarm intelligence optimization, Ant colony optimization, etc. These methods can be effective for this problem, but they require a huge amount of computation (long execution time), also memory consumption. In order to overcome these weaknesses, parallel computing can be used. In this survey, we will review a set of papers about parallel evolutionary algorithms that used for feature selection in large datasets. Furthermore, we will compare the performance of different algorithms and environment. The rest of the paper is organized as follow: Section2 is Background about feature selection approaches and parallel architecture in general. Section3 talk about parallel evolutionary algorithms. Section 4 will discuss and review many papers which talk about the feature selection problem by using parallel computing. Section5 contains the summary of the survey, the last section is the conclusion and future work. II. BACKGROUND In general, there are three classes of feature selection: filter-based, wrapper, and embedded. The filter approach analyzes the features statistically and ignores the classifier [18]. Most of filter-based methods perform two operations, ranking and subset selection. In some cases, these two operations are performed sequentially, first the ranking, then the selection, in other cases only the selection is carried out. These methods are effective in terms of execution time. However, filter methods sometimes select redundant variables; since they don’t consider the relationships between variables. Therefore, they are mainly used as a pre-processing method. In the wrapper model [15], the process of feature selection is depending on the performance of a specific classifier. But its disadvantages are time-consuming and over fitting. The last International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 181 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 2. method for feature selection is the embedded. In this method, the feature selection process and the learning algorithm (tuning the parameters) are combined to each other[6, 15]. The selection of optimal feature subset is an optimization problem that proved to be NP-hard, complex, and time- consuming problem [13]. Two major approaches are traditionally used to tackle NP-hard problems, as seen in Figure1: exact methods and metaheuristics. Exact methods allow exact solution to be found, but this approach is impractical since it is extremely time consuming for real world problems. On the other hand, metaheuristics are used for solving complex and real world problems. Because metaheuristics provide suboptimal (sometimes optimal) solution in reasonable time [2, 11, 13]. As seen in Figure1, Metaheuristics are divided into two categories [13]: • Trajectory-based (exploitation-oriented methods): the well-known metaheuristics families based on the manip- ulation of a single solution. Include Simulated Annealing (SA), Tabu Search (TS), Iterated Local Search (ILS), Variable Local Search (VNS), and Greedy Randomized Adaptive Search Procedures (GRASP). • Population-based (exploration-oriented methods): the well-known metaheuristics families based on the manipulation of a population of solutions. Include PSO, ACO, SS, Evolutionary Algorithms (EAs), Differential Evolution (DE), Evolutionary Strategies (ES), and Estimation Distribution Algorithms (EDA). Fig. 1. Approaches for handling NP-hard problems Metaheuristics algorithms have proved to be suitable tools for solving the feature selection accurately and efficiently for large dimensions in big datasets [2]. The main problems when dealing with big datasets are: The first is execution time because the complexity of the metaheuristics methods for feature selection is at least O(n2 D), where n is the number of instances and D is the number of features. The second is memory consumption since most methods for feature selection need to store the whole dataset in memory. Therefore, the researchers try to parallelize the sequential metaheuristics to improve their efficiency for feature selection on large datasets. There are many programming models and paradigms, such as MapReduce (Hadoop, spark), MPI, OpenMP, CUDA [1, 6, 13]. Parallel computing can be process interaction (shared memory, message passing) or problem decomposition (task or data parallelization) [6]. Parallel computing is a good solution for these problems since many calculations are carried out simultaneously in the task and/or data [6]. Population-based metaheuristics are naturally prone to parallelize since most of their variation operators can be easily undertaken in parallel [2, 13]. Parallel implementations of metaheuristics are an effective alternative to speed up sequential metaheuristics; by reducing the search time for solutions of optimization problems. Furthermore, they lead to the more precise random algorithm and improve the quality of solutions [11]. As seen in Figure2, the implementation of parallel metaheuristics is divided into two categories [13]. Fig. 2. Parallel implementation of metaheuristics Parallel evolutionary algorithms are used in many works rather than feature selection, such as inferring phylogenies, traffic prediction. In [9] Santander et al., used MPI/OpenMP with a hybrid multiobjective evolutionary algorithm (fast non- dominated sorting genetic algorithms and firefly algorithm); for phylogenetic reconstruction (Inferring evolutionary trees). In [10] Jiri at al., used parallel multiobjective GA with OpenMP. In order to make traffic prediction more accurate. Master-Slave scheme of GA was implemented on multi-core parallel architecture. They reduced the computational time, but it was successful for short-term traffic prediction. III. OVERVIEW OF PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION Feature selection algorithms are used to find an optimal subset of relevant features in the data. In this section we will talk about parallel evolutionary algorithms that are used for feature selection problem in large datasets. We will illustrate the steps of six algorithms (PGA, PCHC, PPSO, PGPSO, PSS, and PACO). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 182 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 3. A. Parallel Genetic algorithm (PGA) In order to increase the efficiency and reduce the execution time of the genetic algorithm (GA); the researchers used par- allel GA. Algorithm 1 presents the parallel GA methodology, with the master-slave model of parallel GA. Algorithm 1 Parallel genetic algorithm [10] Create initial population Evaluate initial population Create slaves while not done do Start slave Wait for slave to finish Run mutation operator end while for i=1 to slave iterations do Select individuals Run crossover operator Evaluate offsprings if solution found then set done=True end if end for B. Parallel CHC algorithm (PCHC) A CHC is a non-traditional GA, which combines a con- servative selection strategy (that always preserves the best individuals found so far), that produces offsprings that are at the maximum hamming distance from their parent. The main processes of CHC algorithm are [1]: • Half-Uniform Crossover (HUX): This will produce two offsprings, which are maximally different from their two parents. • Elitist selection: this will keep the best solutions in each generation. • Incest prevention: this step prevents two individuals to mate if the similarity between them greater than a thresh- old. • The Restarting process: if the specified population stagnated, then this step generated a new population by choosing the best individuals. C. Particle Swarm Optimization (PSO) This subsection handles the geometric particle swarm optimization (GPSO) and shows the algorithm that used to parallelize PSO or GPSO. 1) Geometric Particle Swarm Optimization (GPSO): GPSO is a recent version of PSO. The key issue in GPSO is the using a multi-parental recombination of solutions (particles). In the first phase, a random initialization of particles created. Then the algorithm evaluates these particles to update the historical and social positions. Finally, the three parents (3PMBCX) move the particles, as shown in Algorithm 2: Algorithm 2 GPSO algorithm [2] S:SwarmInitialization() while not stop condition do do for each particle i of the swarm S do do evaluate(solution(xi)) update(velocity equation (hi)) update(global best solution (gi)) end for for each particle i of the swarm S do do xi:3PMBCX ((xi, wa), (gi, wb), (hi, wc)) mutate(xi) end for end while Output: best solution found 2) Parallel Multi Swarm Optimization (PMSO): Parallel multi swarm optimization presented in [2], it was defined in analogy with parallel GA as a pair of (S, M), where S is a collection swarm, and M is a migration policy. Algorithm 3 depicts the parallel PSO methodology. Algorithm 3 Multi swarm optimization [2] DO IN PARALLEL for each i 1,...,m initialize(Si) while not stop condition do do iterate Si for n steps /* PSO evolution */ for each Sj (Si) do do send particles in s(Si) to Sj end for for each Sj such that Si (Sj ) do do receive particles from Sj replace particles in Si according to r end for end while Output: best solution ever found in the multi-swarm D. Parallel Scatter Search (PSS) Scatter search is an evolutionary method that was success- fully applied to hard optimization problems. It uses strategies for search diversification and intensification that have proved effective in a variety of optimization problems, see Algorithm 4. E. Parallel Ant Colony Optimization (PACO) When dealing with huge search space, parallel computing techniques usually applied to improve the efficiency. Parallel ACO algorithms can achieve high-quality solutions in reasonable execution times comparing with sequential ACO [18]. In Algorithm 5, the methodology of PACO is presented. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 183 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 4. Algorithm 4 Parallel scatter search methodology [11] Create Population (Pop, PopSize) Generate ReferenceSet (RefSet, RefSetSize) while Stopping Criterion1 do while Stopping Criterion2 do Select Subset (Subset, SubsetSize) for each processor r=1 to n do in parallel do Combine Solutions (SubSet, CurSol) Improve Solution (CurSol, ImpSol) end for end while Update ReferenceSet (RefSet) end while Algorithm 5 Parallel ant colony optimization methodology [18] Generate Ants Initialize N processors Multicast to all slaves processors N and the task ids of all slaves for each slave do do Send a number between 0 and N that identifies the task inside the program end for while not all slaves have sent back solution do Wait for solution if a slave returns a solution that is better than any solution received then Multicast this solution to all slaves end if end while Return the best solution IV. PARALLEL EVOLUTIONARY ALGORITHMS FOR FEATURE SELECTION We reviewed a set of research papers, which were dealing with feature selection problem for high dimensional datasets in a parallel environment and using parallel evolutionary algorithms. Let us discuss these studies in the following subsections. A. Parallel GA Liu et al., [5] used parallel GA for selecting informative genes (features) in tissue classification, using wrapper approach. The main purpose was to find the subset of features with fewer elements and higher accuracy. The parallelization of GA performed by dividing the population into sub- populations, and then the GA run on each sub-population. Therefore, the searching for the optimal subset of genes can be on several CPUs/computers at the same time. For evaluation, the Golub classifier was used. This classifier introduced by the authors and it depend on the sign of the results for classification; if the sign is positive the sample x belongs to class 1, else if it negative the sample x belongs to class 2. This classifier used only if the datasets have two classes. The accuracy of the classifier tested by using the LOOCV (leave one out cross validation) method. The results showed that using the parallel GA increased the accuracy, and reduced the number of genes that used for classification. In [8] Zheng et al., analyzed the execution speed and solution quality of many parallel GA schemes theoretically. Furthermore, they pointed to the best scheme of parallel GA that used on multi-core architecture. This paper considered the relationship between speed and parallel architecture along with solution quality. They analyzed (Master-Slave, Synchronous Island, Asynchronous Island, Cellular, and hybrid scheme of Master- Slave and Island) schemes of parallel GA, with Pthread library on multi-core parallel architecture. To validate their theoretical analyzing an experiments performed. The hybrid scheme of (Master-Slave and Asynchronous Island) was the best scheme in performance using multi-core architecture. The Island scheme has the best execution time, but the worst solution quality. To improve the solution quality when using Island model it is better to decrease the number of islands. The Asynchronous Island is faster than the Synchronous. The Master-Slave scheme has the best solution quality and the worst execution time. Soufan at el., [15] developed a web-based tool called DWFS, which used for feature selection for different problems. This tool followed a hybrid approach of wrapper and filter. First, the filter used as preprocessing and select the top ranked features based on tunable and a predefined threshold. In the next step, parallel GA based on wrapper approach applied to the selected features to search for subset features that increase the classifier accuracy. The scheme of parallel GA was Master-Slave; the master node used to create initial population and GA steps. While the slave (worker) nodes used for fitness evaluation of each chromosome, this implementation is performed on 64 core. For evaluation, they used three different classifiers (Bayesian classifier, K-nearest neighbor, and a combination of them). The results of the experiments show that DWFS tool provided many options to enhance the feature selection problem in different biological and biomedical problems. In [7] Pinho et al., presented a framework called ParJEColi (java-based library) for a parallel evolutionary algorithm in bioinformatics applications. The aim of this platform was to make the parallel environment (multi-core, cluster, and grid) easy and transparent to the users. This library adapted itself to the problem and the target parallel architecture. The user can easily configure the parallel model and the target architecture; since, ParJEColi encapsulated the parallelization International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 184 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 5. concerns as features. The explicit steps implemented by a simple GUI. The experiments for validation this framework were done on 2 biological dataset and many bioinformatics scenarios. The results indicate that the proposed framework improves the computational performance (decreases execution time) also the solution quality. B. Parallel CHC In [1] Peralta et al., presented a parallel evolutionary algorithm called CHC algorithm by using the MapReduce paradigm for selecting features in high dimensional datasets to improve the classification. The parallelization of CHC algorithm is done by using MapReduce procedure (Hadoop implementation). A cluster of computers of 20 computing nodes were used. Each dataset split into 512-map task. For evaluating their work, three classifiers where used SVM (support vector machine), logistic regression, and Bayesian classifier. The results showed that the run time for classification increased as the number of features decreased, except for Bayesian classifier. They explained this result as follow: if the number of blocks less than the number of computing machines; this leads to have some machines remain idle. In addition, if the number of blocks greater than the number of computing machines, the blocks maybe will not distributed in efficient way. They compared parallel CHC with the serial version, and they concluded that the accuracy of classification increased by using parallel CHC. Furthermore, the parallel version of CHC reduced the run time when the datasets is high dimensional. C. Parallel PSO PSO is an efficient optimization technique, it used to solve the problem of feature selection in high dimensional datasets. In [4] Chen et al., used the parallel PSO algorithm for solving two problems at the same time. By creating an objective function that takes into account three variables at the same time (the selected features, the number of support vectors, and average accuracy of SVM). In order to maximize the capability of SVM classifier in generalization. The proposed method called PTVPSO-SVM (parallel time variant particle swarm optimization support vector machine), it had two phase: 1) the parameter settings of SVM and feature selection work together. 2) the accuracy of SVM evaluated using the set of features and the optimal parameters from the first phase. They used parallel virtual machine (PVM) with 8 machines; and 10-fold cross validation. The results showed that they could achieve the following aims: increasing the accuracy classification, reducing the execution time comparing with sequential PSO, producing an appropriate model of parameters, and selecting the most discriminative subset of features. Feature selection can be carried out based on rough set theory with searching algorithm as in [3, 6]. In [6] Qian et al., proposed three parallel attribute reduction (feature selection) algorithms based on MapReduce on Hadoop. The first algorithm was built by constructing the proper (key, value) by rough set theory and implementing MapReduce functions. The second algorithms were done by realizing the parallel computation of equivalence classes and attribute significances. The last parallel algorithm was designed to acquire the core attributes and a reduce in both data and parallel task. The experiments are performed on a cluster of computers (17 computing node). They considered the performance of the parallel algorithms, but they did not focus on the classification accuracy; since the sequential and parallel algorithms gave the same results. The results showed that the proposed parallel attribute reduction algorithms could deal with high dimensional datasets in an efficient way and better than the sequential algorithms. In [3] Adamczyk, use rough set theory for attribute reduction, to increase the efficiency he implemented parallel Asynchronous PSO for this problem. The parallelization was done by assigning the complex function computations in slave cores and the main core make the updating particle and checking the convergence of the algorithm. From their experiments it was noticeable that the efficiency and speedup of parallel PSO algorithm were raising as the size of dataset increased. The achievable accuracy was not astonishing, but it was better than the classical algorithms. D. Parallel GPSO In [2] Garcia-Nieto et al., parallelized a version of PSO called GPSO which is suitable for feature selection problem in high dimensional datasets. The proposed method was called PMOS (Parallel multi-swarm optimizer). Which was done by running a set of parallel sub PSOs algorithms, which forming an island model. Migration operation exchanged solutions between islands based on a certain frequency The aim of the fitness function increasing the classification accuracy and reduce the number of selected genes (features). They used the SVM classifier (Support Vector Machine) to prove the accuracy of the selected subset of features. In their experiments, they used a cluster of computers as a International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 185 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 6. parallel architecture. They found that 8-swarm PMSO was the best choice for parallelization. The results pointed out that this algorithm was better than the sequential version and other methods in term of performance and accuracy while it selected few genes for each subset. E. Parallel SS In [11] Lopez et al., present a parallel SS metaheuristics for solving feature selection problem in classification. They proposed two methods for combining solutions in SS. The first method is called GC (greedy combination): in this strategy, the common features of the combined solutions are added, then at each iteration one of the remaining features is added to any new solution. The second strategy is called RGC (reduced greedy combination), it has the same start as GC, but in the next step, it considers only the features that appear in solutions with good quality. Then the parallelization of SS is obtained by running these two methods (GC, RGC) at the same time on two processors. Using different combination methods and parameters settings at each processor. They compared the proposed parallel SS with sequential SS and GA. The results show that the quality of solution in parallel SS is better than solutions which was obtained from the sequential SS and GA. Also, the parallel SS use a smaller set of features for classification. The run time is the same for parallel and sequential SS. F. Parallel ACO This subsection shows how the parallel ACO is used to solve feature selection problem for classification in high dimensional datasets. In [17] Meena et al., implemented a parallel ACO to solve the feature selection problem for long documents. The parallelization was done using MapReduce programming model (Hadoop) that automatically parallelize the code and data then run them on a cluster of computing nodes. The wrapper approachis used as evaluation criteria that used Bayesian classifier. Furthermore, the accuracy of the classifier was based on these metrics: precision, recall, accuracy and F-measure. The enhanced algorithm (parallel ACO) was compared with ACO, enhanced ACO, and two feature selection methods, CHI (Statistical technique) and IG (Information Gain). They used Bayesian classifier in evaluation process. The results showed that for a given fixed quality of the solutions the proposed algorithm could reduce the execution time but without considered the solution quality. On the other hand, the accuracy of the classifier was increased using parallel TABLE I SUMMARY OF ALGORITHMS AND PROGRAMMING MODELS Paper Used evolutionary algorithm Parallel Programming model Peralta et al. [1] CHC (Type of GA) MapReduce Garcia-Nieto et al. [2] GPSO MALLBA Adamczyk [3] PSO Unknown Chen et al. [4] PSO PVM Liu et al. [5] GA Unknown Lopez et al. [11] SS Unknown Soufan et al. [15] GA MPI Meena et al. [17] ACO MapReduce ACO comparing with sequential ACO and feature selection methods. In [12] Cano et al., parallelized an existing multi-objective ant programming model that used as the classifier. This algorithm was used for rule mining in high dimensional datasets. The parallelization was done on data and each ant encoded a rule. This was achieved by let each processor perform the same task on a different subset of the data at the same time. In the implementation, they used GPUs, which are multi-core and parallel processor units architecture. This parallel model Followed CUDA method. For evaluation they used these metrics: true positive, false positive, true negative, false negative, sensitivity, and specificity. The results indicate that the efficiency of this model was increased as the size of datasets increased. V. SUMMARY AND DISCUSSION The summary of the papers that implemented the parallel EA for solving the classification problem in high dimensional datasets is reported in Table 1 and Table 2. Many research papers [2, 3, 7, 8, 9, 10, 12], stated that we can reduce the execution time and achieve acceptable speed ups, when applying parallel evolutionary algorithms on multiple processors. We noticed that they achieved a reasonable speed up in many cases. In the next table (Table 2), when comparing the accuracy of parallel EA it is important to notice how many classifiers were used to measure the accuracy. Furthermore, we should consider the metrics that were used to evaluate the classifier. For example, the parallel PSO and its variants have the higher accuracy; but they used only one metric which is the success rate. This means that the parallel PSO is not the most accurate parallel EA based on Table 2. On the other hand, the parallel GA and its variant has the least accuracy, but they used from two to five metrics for evaluation purpose. Based on these metrics, we can say that International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 186 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 7. TABLE II SUMMARY OF DATASETS, CLASSIFIERS, AND ACCURACY RESULTS Paper dataset Classifiers Metrics for classification Accuracy Peralta et al. [1] Epsilon Bayesian AUC= (TPR+TNR)/2 0.71 SVM 0.68 Logistic Regression 0.70 ECBDL 14-ROS Bayesian 0.67 SVM 0.63 Logistic Regression 0.63 Garcia- Nieto et al. [2] Colon SVM Success Rate 0.85 Lymp 0.97 Leuk 0.98 Lung 0.97 Adamczyk [3] 15 Data Set — Success rate 0.70 (Avg) Chen et al. [4] 30 Data Set SVM Success rate 0.87 (Avg) Liu et al. [5] Leukemia Golub Success rate 0.88 Colon N/A Lopez et al. [11] 12 Data Set Nearest Neighbor Success rate 0.86 (Avg) Bayesian 0.87 (Avg) Decision Tree 0.86 (Avg) Soufan et al. [15] 9 Data Set K- Nearest Neighbor F1, PPV, GMean,... 0.81(Avg) (GMean) Bayesian 0.79(Avg) (GMean) Meena et al. [17] 2 Data Sets Bayesian F-measure, recall,.... 0.64 (Avg) the parallel GA is the best parallel EA for feature selection in high dimensional datasets VI. CONCLUSION After the review of different parallel EA that are used to solve the feature selection problem in high dimensional datasets. We adopted the accuracy as a measure to compare the algorithms performance. The following points show our conclusion about the perfor- mance of the mentioned algorithms in this chapter for feature selection: • GA and its variants: based on the papers we reviewed, the parallel GA has the higher accuracy. • PSO and its variants: the parallel PSO has the same accuracy as sequential PSO. • SS: parallel SS gives better results in case of accuracy than GA and sequential SS. • ACO: parallel ACO has the less accurate results than the other parallel EA. It is noticeable that PGAs are the most suitable algorithms for feature selection in large datasets; since they achieved the highest accuracy. On the other hand, the PACO is time-consuming and less accurate comparing with other PEA. References [1] D. Peralta, S. RA.O, S. Rama.rez-Gallego, I.Triguero, J. Benitez, F. Herrera. ”Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach”. Hindawi Publishing Corporation, Mathematical Problems in Engineering, Volume 11, Article ID 246139, (2015). [2] J. Garca-Nieto, E. Alba. ”Parallel multi-swarm optimizer for gene selection in DNA microarrays”. DOI 10.1007/s10489- 011-0325-9, Appl Intell (2012) 37:255266. [3] M. Adamczyk. ”Parallel Feature Selection Algorithm based on Rough Sets and Particle Swarm Optimization”. DOI:.10.15439/2014F389, ACSIS, VOL.2. IEEE, (2014). [4] H. Ling Chen, B. Yang, S. Wang, G. Wang, D. Liu, H. Zhong Li, W. Liu. ”Towards an optimal support vector machine classifier using a parallel particle swarm optimization startegy”. Applied Mathematics and Computation 239, 180- 197. Elsevier, (2014). [5] J. Liu, H. Iba. ”Selecting Informative Genes with Parallel Genetic Algorithms in Tissue Classification”. Genome Informatics 12: 14-23, (2001). [6] J. Qian, D. Miao, Z. Zhang, X. Yue. ”Parallel attribute reduction algorithms using MapReduce”. Elsevier, Information Sciences (2014). [7] J. Pinho, L. Sobral, M. Rocha. ”Parallel evolutionary computation in bioinformatics applications”. Elsevier. Computer methods and programs in biomedicine 110, 183191, (2013). [8] L. Zheng, Y. Lu, M. Ding, Y.Shen, M. Guo, S. Guo. ”Architecture-based performance evaluation of genetic algorithms on multi/many core systems”. The 14th IEEE International Conference on Computational Science and Engineering. 978-0-7695-4477-9/11, (2011) IEEE DOI 10.1109/CSE.2011.65 [9] S. Santander-Jimenez, M. Vega-Rodrguez. ”Parallel Multiobjective Metaheuristics for Inferring Phylogenies on Multicore Clusters”. DOI 10.1109/TPDS.2014.2325828, IEEE (2014). [10] J. Petrlik, L. Sekanina. ”Towards Robust and Accurate Traffic Prediction Using Parallel Multiobjective Genetic Algorithms and Support Vector Regression”. (2015) IEEE 18th International Conference on Intelligent Transportation Systems. International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 187 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 8. [11] F. Lopez, M. Torres, B. Batista, J. Perez, J. Vega. ”Solving feature selection problem by a parallel Scatter Search”. Elsevier, European Journal of Operational Research 169, 477-489, (2006). [12] A. Cano, J. Olmo, S. Ventura. ”Parallel Multi-Objective Ant Programming for Classification Using GPUs”. Journal of Parallel and Distributed Computing, November (2012). [13] E. Albaa, G. Luquea, S. Nesmachnowb. ”Parallel metaheuristics: recent advances and new trends” Intl. Trans. in Op. Res. 20 (2013) 148, DOI: 10.1111/j.1475- 3995.2012.00862.x [14] H. Liu, M. Hiroshi. ”Instance Selection and Construction for Data mining”. Springer ,Eds (2001). [15] O. Soufan, D. Kleftogiannis, P. Kalnis, V. Bajic. ”DWFS: A wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm”. PLOS ONE, DOI: 10.1371/journal.pone.0117988, February (2015). [16] A. Gottlieb, G.S. Almasi. ”Highly Parallel Computing”. Benjamin-Cummings Publishing Co., Inc. Red word city, CA, USA (1989). [17] M. Meena, K.R. Chandran, A. Kathik, A. Samuel. ”A parallel ACO algorithm to select terms to categorise longer documents”. Int. J. Computational Science and Engineering, Vol 6 , No.4, (2001). [18] A. Sameh, A. Ayman, N. Hasan. ”Parallel Ant Colony Optimization”. International Journal of research and reviews in Computer Science (IJRRCS), Vol.1, No. 2, June (2010). International Journal of Computer Science and Information Security (IJCSIS), Vol. 16, No. 3, March 2018 188 https://sites.google.com/site/ijcsis/ ISSN 1947-5500