An effective classification approach for big data with parallel generalized Hebbian algorithm

Bulletin of Electrical Engineering and Informatics
Vol. 10, No. 6, December 2021, pp. 3393~3402
ISSN: 2302-9285, DOI: 10.11591/eei.v10i6.3135 3393
Journal homepage: http://beei.org
An effective classification approach for big data with parallel
generalized Hebbian algorithm
Ahmed Hussein Ali1
, Royida A. Ibrahem Alhayali2
, Mostafa Abdulghafoor Mohammed3
, Tole Sutikno4
1
ICCI, Informatics Institute for Postgraduate Studies, Baghdad, Iraq
1,2
Department of Computer, College of Education, AL-Iraqia University, Iraq
1
Department of Computer Science, AL Salam University College, Iraq
2
Department of Computer Engineering, College of Engineering, University of Diyala, Diyala, Iraq
3
Imam Aadham University College, Iraq
4
Department of Electical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
Article Info ABSTRACT
Article history:
Received Jun 30, 2021
Revised Aug 19, 2021
Accepted Oct 31, 2021
Advancements in information technology is contributing to the excessive rate
of big data generation recently. Big data refers to datasets that are huge in
volume and consumes much time and space to process and transmit using the
available resources. Big data also covers data with unstructured and structured
formats. Many agencies are currently subscribing to research on big data
analytics owing to the failure of the existing data processing techniques to
handle the rate at which big data is generated. This paper presents an efficient
classification and reduction technique for big data based on parallel
generalized Hebbian algorithm (GHA) which is one of the commonly used
principal component analysis (PCA) neural network (NN) learning
algorithms. The new method proposed in this study was compared to the
existing methods to demonstrate its capabilities in reducing the dimensionality
of big data. The proposed method in this paper is implemented using Spark
Radoop platform.
Keywords:
Big data
Generalized Hebbian algorithm
Machine learning
Neural network
Principal component analysis
Spark Radoop
This is an open access article under the CC BY-SA license.
Corresponding Author:
Ahmed Hussein Ali
Department of Computer Science, AL Salam University College
119 Baghdad, Taji, Iraq
Email: msc.ahmed.h.ali@gmail.com
1. INTRODUCTION
The problem of big data borders on their size, volume, and rate of generation from multiple sources
(including machines and human) [1]-[13]. There are many forms of big data, such as web and social media
data, business transaction data, machine-to-machine data, and biometric data [14]-[39]. Big data cannot be
described just as a large database but is often unstructured and is currently on the increase in all domains.
High dimensional input data streams are highly important for most information processing tasks, such as
communication and pattern recognition, and can help in reducing noise and redundancy to allow for the
extraction of useful information from input signals. Consequently, information processing, transmission, and
storage on both software and hardware has become easier due to the ability to reduce data dimensionality.
One of the common feature extraction methods is principal component analysis (PCA) which is used to
extracts useful information through establishing the patterns in the input space. PCA is mainly aimed at
obtaining the accurate data representation that can reduce the redundant components [40]-[47].
The PCA [48], [49] transform is mainly used for tasks such as pattern recognition, data compression,
and classification. It is also called Karhunen-Loeve transform (KLT) [50]-[53]. Despite the wide application
of numerous PCA-based algorithms [14], [54], most are not suitable for real time applications due to the high

 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3394
computational complexity of such algorithms in high dimension feature vectors. So, the computational speed
of PCA can be improved by using a number of algorithms, even though the nagging problem. Most of these
algorithms are only implemented as software to achieve moderate performance. PCA and its variants can also
be implemented on hardware, but this requires enough resources and complex circuit control systems. Hence,
it is only considered for small dimension. Implementation on PCA neural network (NN) is another alternative
for PCA implementation [55]. This is done using the GHA [56] but the problem is the slow convergency of
GHA which make it mandatory to perform several iterations, thereby prolonging the computational time for
most GHA-based algorithms. Most data reduction techniques are exceptional in saving bandwidth and time
through enabling user to process large datasets using minimal available resources. Being that much data is
involved in data mining process, data reduction processes have become mandatory as the aim is to retrieve
important information from such large datasets. Data size reduction is also a nagging problem because most
of the straightforward techniques only work on small data and not on big data. Hence, software design stage
is a crucial phase during the building of data reduction algorithms for big data processing.
The recent works on parallel big data dimensionality reduction are reviewed in this section. The
study by [57] presented a hybrid PGO-SVM-based model that combined SVM with PGO for improved
classification accuracy even when faced with small number of feature subsets. The proposed PGOSVM was
implemented with Spark Radoop with distributed data points storage using Hadoop dispersed file system
(HDFS). The classification efficiency of the proposed model on large dataset was better and exhibited faster
execution time than the benchmark method. Scala was used as the programming language to implement the
PGOSVM while Covtype and Higgs datasets were analyzed. Another study present a fast HP-PL model [58]
as a new way of improving DR and classification accuracy. The system was implemented on Apache Spark
and was capable of selecting the best features within the shortest computational time. Even though the
improvement level is dependent on the data features, the system showed good performance on the number of
evaluated nodes for the tested datasets. The iterated PCA (IPCA) [59] method has been proposed for fault
detection in a continuously stirred tank reactor (CSTR) model. The proposed IPCA relies on the GHA for
memory complexity problems. The reason for addressing the fault detection problem is to facilitate online
computation of the principal components in a recursive manner. The GHA was developed to define a
function that can merge all the major factors that affect the fault detection capability of the developed model.
Song et al proposed the TOC-based PCA algorithm [60] that can exploit the advantage of optical computing
in big data computation to solve the issues related to the PCA algorithm in electronic computers. The parallel
operation of the system ensured that the efficiency is greatly improved. Another paper by Jian et al. [59]
demonstrated that the GHA has non-approaching adaptive learning rates by investigating the convergence of
the GHA using the DDT method. It is simple to solve the computational roundoff constraints and satisfy the
tracking requirements in real applications because these adaptive learning rates can achieve non-zero
constants convergence. As a generalization of the Hebbian learning paradigm, Eraldo and colleagues [60]
proposed a new adaptation strategy for linear neural networks. In this paper, an efficient classification and
reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) and implemented by
using Spark Radoop platform will be presented. The new method proposed will be compared to the existing
methods to demonstrate its capabilities in reducing the dimensionality of big data.
This paper is organized in this manner: apache spark radoop is presented in section 2. The principle
of GHA and the suggested method are presented in section 3. The materials used in this work, and the
methods employed are presented in section 4, and the results and discussion are presented in section 5 while
section 6 presents the conclusion and possible future works.
2. APACHE SPARK RADOOP
RapidMiner Radoop is an extension of the in-memory functionality of RapidMiner that allows for
the provision of sophisticated operators that are implementable for in-Hadoop execution [61]-[66]. It was
developed as an extension of the in-memory functionality of RapidMiner for the provision of sophisticated
operators that are implementable for in-Hadoop execution [67]-[73]. For data transformation in Radoop [61],
there are more than 60 operators available. It is also capable of advanced and predictive modeling on Hadoop
clusters in a distributed manner. RapidMiner [74] is a data mining application. Radoop relies on RapidMiner
Studio's visual workflow designer to make the creation, implementation, and maintenance of predictive
analytics in Hadoop as simple as possible. Because of Hadoop's code-free environment [62], [74] and built-in
intelligence, the intricacies of the system are kept to a minimum, allowing the operator to concentrate solely
on addressing business challenges rather than on technical concerns. This ensures that predictive analytics for
both TBs and PBs of data is effective and scalable because the workflow execution is handled by Radoop
rather than the user; all computations are executed in the Hadoop cluster that holds the data. Radoop was
developed as an extension to ensure that Hadoop and RapidMiner could work together seamlessly. It is a data

Bulletin of Electr Eng & Inf ISSN: 2302-9285 
An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali)
3395
science software that simplifies the process of preparing data for machine learning on Hadoop and Radoop
Spark (refer to Figure 1). Throughout RapidMiner Studio, all parallel operations and data processes are
implemented on the SparkRM platform within the Hadoop cluster to ensure that Apache Spark may be used
for task execution, hence broadening the applicability of the tool and enabling stronger algorithms. Hive and
Mahout are made up of data analytics routines that have been well optimized, and as a result, they were used
in this study as well. Figure 2 depicts the overall framework for the integration of Hadoop into RapidMiner.
In this study, an extension was developed that allows for close connection with Hadoop while also providing
the same Hadoop features as those used in memory-based RapidMiner operations. The initial stage in
creating the Radoop is to include the RadoopNest meta-operator, which contains the basic cluster parameters.
This meta-operator serves as a foundation for the operation of the remaining operators.
Figure 1. Spark Radoop architecture
Figure 2. The neural model for the GHA
3. THE PROPOSED GENERALIZED HEBBIAN ALGORITHM
The GHA [75] is a linear feedforward NN framework that is well-suited for unsupervised learning
applications and is often used in PCA. It is advantageous in terms of processing efficiency since it can handle
the problem of eigenvalue using iterative approaches, which eliminates the need for direct computation of the
covariance matrix. Because of the capacity to handle eigenvalue issues iteratively, there is no need to
compute and answer eigenvalue issues in a linear fashion. As a result, GHA was created as a solution to
memory complexity difficulties, particularly when dealing with large-scale data sets as shown in Figure 2. In
order to provide a memory-efficient implementation, GHA is designed to be flexible and adaptable to time-
varying distributions. Particle-counting analysis [75]-[81] is regarded as an attribute reduction method that is
beneficial when dealing with data that is derived from numerous characteristics and contains some
redundancy. Because they are most likely assessing the same concept, redundancy in this circumstance
means that there is some type of correlation between some attributes. As a result of this redundancy, it is
thought that the observed attributes can be reduced to a smaller number of PCs, each of which will be
representative of the fluctuations in the observed characteristics. The PCA method uses orthogonal

 ISSN: 2302-9285
3396
transformation to convert a set of data with linked qualities into a set of values referred to as principle
components (uncorrelated attributes) [82]-[87]. Considering that the number of PCs is typically less than or
equal to the number of original attributes, this transformation is defined in such a way that the variance of the
first PC, which accounts for the majority of data variability, is as high as possible, and each of the succeeding
components has the highest possible variance under the condition that it is orthogonal to the PCs [88].
This section pointed out some of the considerations for the implementation of the proposed algorithm
with Radoop. Several steps are involved in the parallelization process. A virtual machine cluster was considered
for the tuning and testing of the experimental conditions. The experiments were carried out on three different
supercomputers. Because big data is taken into consideration in this work, it is probable that the dataset will
contain a huge number of transactions. As a result, some of the large transactional data sets are kept in the
HDFS, while numerous data fractions are distributed across the cluster nodes. The execution of jobs on data
partitions is carried out in parallel by the Spark engine. We generated and processed a collection of RDDs in
order to construct the set of frequently occurring l-iternsets, which were then arranged in descending order. The
proposed PGHA is applicable in big data streaming using classification methods. It can be used to reduce all the
stored dataset as HDFS files, and handle dataset with numeric features. Figure 3 presents the overall proposed
algorithm for data reduction which can be implemented as Map and Reduce functions.
Proposed Parallel GHA
Input: S (Dense array)
Output: T (Reduced array)
1 Begin
2 Execute the spark context (Slave)
3 Listen to the master connection.
4 Receive a dense array of data
5 Check the length of the columns M
6 Data should be parallelized by Spark (Master)
7 N rows of data were collected
8 do in parallel
9 Set the initial synaptic weights wij and thresholds j to small random values, such as [0, 1], and
then repeat the procedure. Assign tiny positive values to the learning rate parameter as well as
the forgetting factor.
10 Calculate the output of the neuron at iteration T.
11 Update the weights in the network: wij(p + 1) = wij(p) + ∆wij(p), // i, j = 1, 2, ..., n
12 Send reduction array T
13 Close connection
14 End
Figure 3. Overall proposed algorithm for data reduction
4. MATERIAL AND METHOD
A number of supervised classification approaches were considered in this study, including Nave
Bayes, K-Nearest Neighbours, NN, and Random Forest, among others. To begin, Table 1 has a description of
the system, while Table 2 has a description of the six datasets that were used in the study. The computation
times for parallel GHA and parallel PCA on the identical hardware arrangement were used to present the
results. With respect to the six large datasets that were used for the analysis in this study, the performance of
Apache Spark and MLlib 2.0 was compared. The six datasets used in this study were obtained from the UCI
ML repository. The experiments in this research are comprised of a Spark cluster that runs on Apache
Zeppelin 0.7.1 and an HDFS, which is described in detail in the paper. The Spark cluster is made up of four
nodes: a master node that executes the driver application, three worker nodes, and a cluster manager. The
three nodes were configured in a manner that was similar to that shown in Figure 1. The three worker nodes
were each given a memory allocation of 48 GB and were configured with four executors (each with a
memory allocation of 4 GB) and two cores. Each worker was allotted three executors (each with a memory
size of five gigabytes) while the master node was allotted two cores. A total of 16 GB of RAM has been
assigned to the driver process. Scala 2.11.8 was used as the programming language for MLlib execution in

3397
the Spark 2.2.1 cluster, with Hadoop 2.7.3 serving as the distributed storage system. The amount of memory
available to the executors in each worker node was changed in order to get the best possible performance.
Table 1. Description of the system
Operating system Windows10
CPU Intel® CoreTM i7-6700 processor running at
3.40 GHz with eight cores
Memory 16 GB
No. of workers 3
Computational framework Apache Spark 2.2.1
Compatible framework Radoop
DSS HDFS (Hadoop 2.7.3)
Code development editor Apache Zeppelin 0.7.1
Coding language Scala 2.11.8
Table 2. Datasets description
Data
No of
record
No of
attributes
No of
classes
Covtype 581012 54 7
Covtype-
2
581012 54 2
Higgs 11,000,000 28 2
Botnet
Attacks
7,062,606 115 10
Dota2 102944 116 2
SUSY 5,000,000 18 2
5. RESULTS AND DISCUSSION
The initial step of the classical GHA is loading the whole dataset to memory. Note that the data size
must be within the limit that can fit within the memory size of the computer. Memoryup is a performance
metric that assesses a parallel clustering algorithm's ability to efficiently utilize the available memory space
on each node. It is possible to compute the memoryup by changing the memory size of each node while
keeping the dataset and the number of nodes the same. The concept of the new GHA approach is to use the
idea of data scanning by rows. The GHA approach can still implemented even when the data exceeds the
computer memory size. A significant amount of CPU time is frequently lost in large datasets as a result of the
unnecessary processing of redundant and non-representative data. The deletion of this type of data can
frequently result in a significant increase in processing performance. A further benefit of eliminating
nonrepresentative data from huge datasets is that storage and transmission of these datasets become less
difficult. The computational advantages of the proposed new system were evaluated using numerical
examples. The computation was performed on a third generation Intel core-i7 2.8GHz processor with 16GB
DDR3 memory. The programming language for all the algorithms in the proposed big data GHA was C++.
Through a thorough examination of the PGHA's running time utilizing the Radoop method and Parallel PCA,
the comparison seeks to assess the speed performance of the algorithm. For the purposes of this example, we
will suppose that the degree of support varies while the number of computer nodes remains at 3. Runtime
with different support degrees for the datasets Covtype, Covtype-2, Higgs, Botnet Attacks, Dota2, and SUSY
are depicted in Figures 4 (a)-(f).
The algorithms is shown by the x-axis, while the running time is represented by the y-axis. The two
techniques appear to be more efficient when the support degree is increased, as can be seen in the graph.
Remember that our approach appears to be faster than parallel PCA when running on all datasets, which is a
significant advantage. The performance of the proposed based was evaluated based on a single processor
because if a parallel algorithm is used, the performance may be over-shadowed by the performance of the
other algorithms. Execution times and speed-up ratios are depicted in Figure 4 in relation to the number of
objects in the datasets for different numbers of processors.
(a) (b)
Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification
experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (a) Covetype, (b), Covetype 2

 ISSN: 2302-9285
3398
(c) (d)
(e) (f)
Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification
experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (c) Higgs, (d) Botnet, (e) Dota2,
(f) SUSY datasets (continue)
The analysis of the dataset Covetype is only possible when the number of processors in our computer
cluster system is equal to or greater than eight. When dealing with large amounts of data, the advantages of a
distributed memory system are readily apparent. Based on the experimental results, it has been demonstrated
that this parallel reduction method has superior speed-up and linear scaling behavior (time complexity), and that
it may be used to overcome space complexity limits by using the aggregate memory of the reduction system.
The performance evaluation of the new approach was based on the method of inducing the base classifier. The
results of the experiments showed that the PGHA, as a data reduction tool, minimized the run time compared to
parallel PCA algorithm as shown in Figure 4. However, our partial reduction method outperformed full
reduction methods in many real-world data analysis and data reduction applications.
6. CONCLUSION
The concept of parallel computing and parallel dimensionality reduction algorithms was introduced
in this study. This article proposed the parallel algorithm concept based on the classical DR algorithm for
effective handling of the issue encountered in big data mining. The proposed framework in this work was
based on the previous studies with the aim of reducing the high volume of input data features while retaining
the relevant information. To achieve this aim, both GHA and the proposed parallel algorithm were used to
improve the DR and reduce features complexity. The evaluation results showed that GHA was better in
reducing redundant features of datasets. In the future studies, effort will be focused on combination of the
proposed parallel GHA in this work with other ML methods, as well as improving the performance of the
latest datasets using some evolutionary optimization techniques.
ACKNOWLEDGEMENTS
The authors would like to thank ICCI, Informatics Institute for Postgraduate Studies, Iraqia
University and Al Salam University College for their facilities, support and cooperation during this research;

3399
and Universitas Ahmad Dahlan to support this collaborative research. Special thanks to the anonymous
reviewers for their valuable suggestions and constructive comments.
REFERENCES
[1] T. H. Davenport, P. Barth and R. Bean, “How Big Data Is Different,” MIT Sloan Manag. Rev., vol. 54, no. 1, pp.
22–24, 2012.
[2] V. Chang, “An ethical framework for big data and smart cities,” Technological Forecasting and Social Change,
vol. 165, 2021.
[3] N. A. N. M. Idros, H. Mohamed and R. Jenal, “The use of expert review in component development for customer
satisfaction towards E-hailing,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 17, no. 1. pp. 347–356, 2019, doi: 10.11591/ijeecs.v17.i1.pp347-356.
[4] K. Anam, C. Avian and M. Nuh, “Multilayer extreme learning machine for hand movement prediction based on
electroencephalography,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp. 2404–2410,
2020, doi: 10.11591/eei.v9i6.2626.
[5] M. N. F. Jamaluddin, A. Ismail, A. A. Rashid and T. T. O. Takleh, “Performance comparison of Java based parallel
programming models,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 3. pp.
1577–1583, 2019, doi: 10.11591/ijeecs.v16.i3.pp1577-1583.
[6] M. B. Swidan, A. A. Alwan, S. Turaev and Y. Gulzar, “A model for processing skyline queries in crowd-sourced
databases,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 2. pp. 798–
806, 2018, doi: 10.11591/ijeecs.v10.i2.pp798-806.
[7] Mustakim, N. K. Sari, Jasril, I. Kusumanto and N. G. I. Reza, “Eigenvalue of analytic hierarchy process as the
determinant for class target on classification algorithm,” Indonesian Journal of Electrical Engineering and
Computer Science (IJEECS), vol. 12, no. 3. pp. 1257–1264, 2018, doi: 10.11591/ijeecs.v12.i3.pp1257-1264.
[8] S. Berhil, H. Benlahmar and N. Labani, “A review paper on artificial intelligence at the service of human resources
management,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp.
32–40, 2019, doi: 10.11591/ijeecs.v18.i1.pp32-40.
[9] W. A. Jbara, “Ear biometric verification approach based on morphological and geometric invariants,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1479–1484, 2020, doi:
10.11591/ijeecs.v20.i3.pp1479-1484.
[10] H. A. Razak, M. A. M. Saleh and N. M. Tahir, “Review on anomalous gait behavior detection using machine
learning algorithms,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 5. pp. 2090–2096,
2020, doi: 10.11591/eei.v9i5.2255.
[11] M. M. Nasr, F. K. Kamel and Y. S. Abd ElWahab, “A survey on predicting oil spills by studying its causes using
deep learning techniques,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22,
no. 1. pp. 580–589, 2021, doi: 10.11591/ijeecs.v22.i1.pp580-589.
[12] P. Chaudhury and H. K. Tripathy, “Optimising the parameters of a RBFN network for a teaching learning
paradigm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 1. pp. 435–
442, 2019, doi: 10.11591/ijeecs.v15.i1.pp435-442.
[13] A. S. I. Hilaiwah, H. A. A. Abed Allah, B. A. Abbas and T. Sutikno, “Live to learn: learning rules-based artificial
neural network,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp.
558–565, 2021, doi: 10.11591/ijeecs.v21.i1.pp558-565.
[14] A. Labrinidis and H. V Jagadish, “Challenges and opportunities with big data,” Proc. VLDB Endow., vol. 5, no. 12,
pp. 2032–2033, 2012.
[15] Q. Shallal, Z. Hussien and A. A. Abbood, “Method to implement K-NN machine learningto classify data privacy in
IoT environment,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 2.
pp. 985–990, 2020, doi: 10.11591/ijeecs.v20.i2.pp985-990.
[16] M. AbdullahAl-Hagery, M. AbdullahAl-Assaf and F. MohammadAl-Kharboush, “Exploration of the best
performance method of emotions classification for arabic tweets,” Indonesian Journal of Electrical Engineering
and Computer Science (IJEECS), vol. 19, no. 2. pp. 1010–1020, 2020, doi: 10.11591/ijeecs.v19.i2.pp1010-1020.
[17] E. Sutoyo and A. Almaarif, “Twitter sentiment analysis of the relocation of Indonesia’s capital city,” Bulletin of
Electrical Engineering and Informatics (BEEI), vol. 9, no. 4. pp. 1620–1630, 2020, doi: 10.11591/eei.v9i4.2352.
[18] A. R. Lubis, M. K. M. Nasution, O. S. Sitompul and E. M. Zamzami, “The effect of the TF-IDF algorithm in times
series in forecasting word on social media,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 22, no. 2. pp. 368–376, 2020, doi: 10.11591/ijeecs.v22.i2.pp368-376.
[19] E. S. Negara, R. Andryani and R. Amanda, “Network analysis of YouTube videos based on keyword search with
graph centrality approach,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22,
[20] M. N. Alraja, M. A. Hussein and H. M. S. Ahmed, “What affects digitalization process in developing economies?
An evidence from smes sector in oman,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1.
pp. 441–448, 2020, doi: 10.11591/eei.v10i1.2033.
[21] N. H. M. Kadir and S. Aliman, “Text analysis on health product reviews using r approach,” Indonesian Journal of
Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 3. pp. 1303–1310, 2020, doi:
10.11591/ijeecs.v18.i3.pp1303-1310.
[22] S. Sangam and S. Shinde, “Sentiment classification of social media reviews using an ensemble classifier,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 1. pp. 355–363, 2019,

 ISSN: 2302-9285
3400
doi: 10.11591/ijeecs.v16.i1.pp355-363.
[23] S. Manikam, S. Sahibudin and V. Kasinathan, “Business intelligence addressing service quality for big data
analytics in public sector,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16,
[24] B. Jabir, N. Falih and K. Rahmani, “HR analytics a roadmap for decision making: Case study,” Indonesian Journal
of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 2. pp. 979–990, 2019, doi:
10.11591/ijeecs.v15.i2.pp979-990.
[25] M. A. B. W. Nordin, D. Vedenyapin, M. F. Alghifari and T. S. Gunawan, “The disruptometer: An artificial
intelligence algorithm for market insights,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 8, no. 2.
pp. 727–734, 2019, doi: 10.11591/eei.v8i2.1494.
[26] O. A. Dawood, O. I. Hammadi, K. Shaker and M. Khalaf, “Multi-dimensional cubic symmetric block cipher
algorithm for encrypting big data,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp.
2569–2577, 2020, doi: 10.11591/eei.v9i6.2475.
[27] W. A. R. Wan Mohd Isa, A. I. H. Suhaimi, N. Noordin, A. F. Harun, J. Ismail and R. A. Teh, “Factors influencing
cloud computing adoption in higher education institution,” Indonesian Journal of Electrical Engineering and
Computer Science )(IJEECS), vol. 17, no. 1. pp. 412–419, 2019, doi: 10.11591/ijeecs.v17.i1.pp412-419.
[28] S. Wilson and R. Sivakumar, “Twitter data analysis using hadoop ecosystems and apache zeppelin,” Indonesian
10.11591/ijeecs.v16.i3.pp1490-1498.
[29] L. Y. Fang, N. F. M. Azmi, Y. Yahya, H. Sarkan, N. N. A. Sjarif and S. Chuprat, “Mobile business intelligence
acceptance model for organisational decision making,” Bulletin of Electrical Engineering and Informatics (BEEI),
vol. 7, no. 4. pp. 650–656, 2018, doi: 10.11591/eei.v7i4.1356.
[30] P. D. Ibnugraha, L. E. Nugroho and P. I. Santosa, “An approach for risk estimation in information security using
text mining and jaccard method,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 7, no. 3. pp. 393–
399, 2018, doi: 10.11591/eei.v7i3.847.
[31] M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 3. pp. 1234–1243,
2018, doi: 10.11591/ijeecs.v10.i3.pp1234-1243.
[32] N. Prasanna Moorthi and Mathivananr, “A study about SOA based agriculture management data framework,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 9, no. 1. pp. 39–42, 2018, doi:
10.11591/ijeecs.v9.i1.pp39-42.
[33] A. M. Saleh, H. Y. Abuaddous, O. Enaizan and F. Ghabban, “User experience assessment of a COVID-19 tracking
mobile application (AMAN) in Jordan,” Indonesian Journal of Electrical Engineering and Computer Science
[34] Hertina et al., “Data mining applied about polygamy using sentiment analysis on twitters in indonesian perception,”
Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 4. pp. 2231–2236, 2021, doi:
10.11591/EEI.V10I4.2325.
[35] T. A. Tran, J. Duangsuwan and W. Wettayaprasit, “A new approach for extracting and scoring aspect using
SentiWordNet,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 3. pp.
1731–1738, 2021, doi: 10.11591/ijeecs.v22.i3.pp1731-1738.
[36] I. S. Nasir, A. H. Mousa and I. L. Hussein Alsammak, “SMUPI-BIS: A synthesis model for users’ perceived impact
of business intelligence systems,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
[37] C. R. Pattnaik, S. N. Mohanty, S. Mohanty, J. M. Chatterjee, B. Jana and V. García-Díaz, “A fuzzy multi-criteria
decision-making method for purchasing life insurance in india,” Bulletin of Electrical Engineering and Informatics
(BEEI), vol. 10, no. 1. pp. 344–356, 2021, doi: 10.11591/eei.v10i1.2275.
[38] N. S. Shaeeali, A. Mohamed and S. Mutalib, “Customer reviews analytics on food delivery services in social
media: A review,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 4. pp. 691–699, 2020, doi:
10.11591/ijai.v9.i4.pp691-699.
[39] A. S. Oh, “Smart urban farming service model with IoT based open platform,” Indonesian Journal of Electrical
Engineering and Computer Science, vol. 20, no. 1. pp. 320–328, 2020, doi: 10.11591/ijeecs.v20.i1.pp320-328.
[40] N. M. Mahfuz, M. Yusoff and Z. Ahmad, “Review of single clustering methods,” IAES International Journal of
Artificial Intelligence, vol. 8, no. 3. pp. 221–227, 2019, doi: 10.11591/ijai.v8.i3.pp221-227.
[41] F. A. N. Rashid, N. S. Suriani and A. Nazari, “Kinect-based physiotherapy and assessment: A comprehensive
review,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 11, no. 3. pp. 1176–
1187, 2018, doi: 10.11591/ijeecs.v11.i3.pp1176-1187.
[42] Z. Faisal and N. K. El Abbadi, “Detection and recognition of brain tumor based on DWT, PCA and ANN,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 56–63, 2019, doi:
10.11591/ijeecs.v18.i1.pp56-63.
[43] A. A. Vinaya, S. Yulianto, Q. A. M. O. Arifianti, D. Arifianto and A. S. Aisjah, “Machinery signal separation using
non-negative matrix factorization with real mixing,” Bulletin of Electrical Engineering and Informatics (BEEI),
vol. 9, no. 4. pp. 1468–1476, 2020, doi: 10.11591/eei.v9i4.1956.
[44] M. S. Abdul Razak and C. R. Nirmala, “A computing model for trend analysis in stock data stream classification,”
2020, doi: 10.11591/ijeecs.v19.i3.pp1602-1609.

3401
[45] K. Gangadharan, G. R. N. Kumari, D. Dhanasekaran and K. Malathi, “Detection and classification of various pest
attacks and infection on plants using RBPN with GA based PSO algorithm,” Indonesian Journal of Electrical
Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1278–1288, 2020, doi:
10.11591/ijeecs.v20.i3.pp1278-1288.
[46] K. Okokpujie, S. John, C. Ndujiuba, J. A. Badejo and E. Noma-Osaghae, “An improved age invariant face
recognition using data augmentation,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1. pp.
179–191, 2021, doi: 10.11591/eei.v10i1.2356.
[47] S. K. Addagarla and A. Amalanathan, “e-SimNet: A visual similar product recommender system for E-commerce,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 563–570, 2021,
doi: 10.11591/ijeecs.v22.i1.pp563-570.
[48] A. M. Martinez and A. C. Kak, "PCA versus LDA," in IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001, doi: 10.1109/34.908974.
[49] M. A. Ahmed, R. A. Hasan, A. H. Ali and M. A. Mohammed, “The classification of the modern arabic poetry using
machine learning,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 17, no. 5, pp.
2667–2674, 2019, doi: 10.12928/telkomnika.v17i5.12646.
[50] I. Kamal, K. Housni and Y. Hadi, “Online dictionary learning for car recognition using sparse coding and lars,”
IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 1. pp. 164–174, 2020, doi:
10.11591/ijai.v19i1.pp164-174.
[51] B. Vijayalaxmi, C. Anuradha, K. Sekaran, M. N. Meqdad and S. Kadry, “Image processing based eye detection
methods a theoretical review,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 3. pp. 1189–
1197, 2020, doi: 10.11591/eei.v9i3.1783.
[52] M. Z. Alksasbeh et al., “Smart hand gestures recognition using K-NN based algorithm for video annotation
purposes,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp. 242–
252, 2021, doi: 10.11591/ijeecs.v21.i1.pp242-252.
[53] H. M. Salman, A. K. M. Al-Qurabat and A. A. R. Finjan, “Bigradient neural network-based quantum particle
swarm optimization for blind source separation,” IAES International Journal of Artificial Intelligence (IJAI), vol.
10, no. 2. pp. 355–364, 2021, doi: 10.11591/ijai.v10.i2.pp355-364.
[54] A. Parveen, Z. H. Khan and S. N. Ahmad, “Classification and evaluation of digital forensic tools,” TELKOMNIKA
Telecommunication, Computing, Electronics and Control, vol. 18, no. 6, pp. 3096–3106, 2020, doi:
10.12928/telkomnika.v18i6.15295.
[55] S. Xu et al., “The fuzzy comprehensive evaluation (FCE) and the principal component analysis (PCA) model
simulation and its applications in water quality assessment of Nansi Lake Basin, China,” Environmental
Engineering Research, vol. 26, no. 2, pp. 222–232, 2021, doi: 10.4491/eer.2020.022.
[56] G. Gorrell and B. Webb, “Generalized hebbian algorithm for incremental latent semantic analysis,” Ninth European
Conference on Speech Communication and Technology, 2005.
[57] A. H. Ali and M. Z. Abdullah, “A parallel grid optimization of SVM hyperparameter for big data classification
using spark Radoop,” Karbala International Journal of Modern Science, vol. 6, no. 1, article 3, pp. 1-18, 2020, doi:
10.33640/2405-609X.1270.
[58] A. H. Ali and M. Z. Abdullah, “A novel approach for big data classification based on hybrid parallel dimensionality
reduction using spark cluster,” Computer Science, vol. 20, no. 4, 2019, doi: 10.7494/csci.2019.20.4.3373.
[59] R. Baklouti, M. Mansouri, M. Nounou, Z. Ben Messaoud and A. Ben Hamida, "Generalized Hebbian Algorithm for
fault detection of CSTR model," 2016 2nd International Conference on Advanced Technologies for Signal and
Image Processing (ATSIP), 2016, pp. 421-424, doi: 10.1109/ATSIP.2016.7523127.
[60] K. Song, B. Zhang, W. Li, L. Yan and X. Wang, “Research on parallel principal component analysis based on
ternary optical computer,” Optik (Stuttg)., vol. 241, 2021, doi: 10.1016/j.ijleo.2021.167176.
[61] M. K. Alsmadi, M. Tayfour, R. A. Alkhasawneh, U. Badawi, I. Almarashdeh and F. Haddad, “Robust feature
extraction methods for general fish classification,” International Journal of Electrical and Computer Engineering
(IJECE), vol. 9, no. 6, pp. 5192–5204, 2019, doi: 10.11591/ijece.v9i6.pp5192-5204.
[62] M. S. Al_Duais and F. S. Mohamad, “Improved Time Training with Accuracy of Batch Back Propagation
Algorithm Via Dynamic Learning Rate and Dynamic Momentum Factor,” IAES International Journal of Artificial
Intelligence, vol. 7, no. 4, pp. 170-178, 2018, doi: 10.11591/ijai.v7.i4.pp170-178.
[63] M. Jupri and R. Sarno, “Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision,” Indonesian
10.11591/ijeecs.v18.i1.pp75-87.
[64] S. Mohamed and A. Ezzati, “A data mining process using classification techniques for employability prediction,”
2019, doi: 10.11591/ijeecs.v14.i2.pp1025-1029.
[65] E. B. B. Palad, M. J. F. Burden, C. R. Dela Torre and R. B. C. Uy, “Performance evaluation of decision tree
classification algorithms using fraud datasets,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9,
no. 6. pp. 2518–2525, 2020, doi: 10.11591/eei.v9i6.2630.
[66] L. M. Padirayon, M. S. Atayan, J. S. Panelo and C. R. Fagela Jr., “Mining the crime data using naïve Bayes
model,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 23, no. 2. pp. 1084–
1092, 2021, doi: 10.11591/ijeecs.v23.i2.pp1084-1092.
[67] Y. Choubik, A. Mahmoudi, M. M. Himmi and L. El Moudnib, “STA/LTA trigger algorithm implementation on a
seismological dataset using hadoop mapreduce,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9,
no. 2. pp. 269–275, 2020, doi: 10.11591/ijai.v9.i2.pp269-275.

 ISSN: 2302-9285
3402
[68] D. A. Jasm, M. M. Hamad and A. T. H. Alrawi, “Deep image mining for convolution neural network,” Indonesian
10.11591/ijeecs.v20.i1.pp347-352.
[69] S. W. Kareem, R. Z. Yousif and S. M. J. Abdalwahid, “An approach for enhancing data confidentiality in hadoop,”
2020, doi: 10.11591/ijeecs.v20.i3.pp1547-1555.
[70] E. E. Abel, A. L. M. Shafie and W. H. Chan, “Deployment of internet of things-based cloudlet-cloud for
surveillance operations,” IAES International Journal of Artificial Intelligence (IJAI), vol. 10, no. 1. pp. 24–34,
2021, doi: 10.11591/ijai.v10.i1.pp24-34.
[71] S. Abed, L. Waleed, G. Aldamkhi and K. Hadi, “Enhancement in data security and integrity using minhash
technique,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 3. pp.
1739–1750, 2021, doi: 10.11591/ijeecs.v21.i3.pp1739-1750.
[72] S. M. Mohammed, K. Jacksi and S. R. M. Zeebaree, “A state-of-the-art survey on semantic similarity for document
clustering using GloVe and density-based algorithms,” Indonesian Journal of Electrical Engineering and Computer
Science (IJEECS), vol. 22, no. 1. pp. 552–562, 2021, doi: 10.11591/ijeecs.v22.i1.pp552-562.
[73] A. Joshi and S. D. Munisamy, “Enhancement of cloud performance metrics using dynamic degree memory
balanced allocation algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
[74] N. M. M. Sobran, M. M. Salmi, M. B. Bahar, M. N. Othman and S. H. Johari, “Fuzzy Takagi-Sugeno Method in
Microcontroller Based Water Tank System,” International Journal of Robotics and Automation (IJRA), vol. 7, no.
1, pp. 1–7, 2018, doi: 10.11591/ijra.v7i1.pp1-7.
[75] M. A. I. Al Jewari, A. Jidin, S. A. A. Tarusan and M. Rasheed, “Implementation of SVM for five-level cascaded H-
Bridge multilevel inverters utilizing FPGA,” International Journal of Power Electronics and Drive Systems
(IJPEDS), vol. 11, no. 3, pp. 1132-1144, 2020, doi: 10.11591/ijpeds.v11.i3.pp1132-1144.
[76] M. A. Mohammed, I. A. Mohammed, R. A. Hasan, N. Ţăpuş, A. H. Ali and O. A. Hammood, "Green Energy
Sources: Issues and Challenges," 2019 18th RoEduNet Conference: Networking in Education and Research
(RoEduNet), 2019, pp. 1-8, doi: 10.1109/ROEDUNET.2019.8909595.
[77] M. A. Mohammed, Z. H. Salih, N. Ţăpuş and R. A. K. Hasan, “Security and accountability for sharing the data
stored in the cloud,” in 2016 15th RoEduNet Conference: Networking in Education and Research, 2016, pp. 1–5.
[78] M. A. Mohammed and N. ŢĂPUŞ, “A novel approach of reducing energy consumption by utilizing enthalpy in
mobile cloud computing,” Studies in Informatics and Control, vol. 26, no. 4, pp. 425–434, 2017, doi:
https://doi.org/10.24846/v26i4y201706.
[79] N. Q. Mohammed, M. S. Ahmed, M. A. Mohammed, O. A. Hammood, H. A. N. Alshara and A. A. Kamil,
"Comparative Analysis between Solar and Wind Turbine Energy Sources in IoT Based on Economical and
Efficiency Considerations," 2019 22nd International Conference on Control Systems and Computer Science
(CSCS), 2019, pp. 448-452, doi: 10.1109/CSCS.2019.00082.
[80] R. A. I. Alhayali, M. A. Ahmed, Y. M. Mohialden and A. H. Ali, “Efficient method for breast cancer classification
based on ensemble hoffeding tree and naïve Bayes,” Indonesian Journal of Electrical Engineering and Computer
Science, vol. 18, no. 2, pp. 1074–1080, 2020, doi: 10.11591/ijeecs.v18.i2.pp1074-1080.
[81] Z. H. Salih, G. T. Hasan and M. A. Mohammed, "Investigate and analyze the levels of electromagnetic radiations
emitted from underground power cables extended in modern cities," 2017 9th International Conference on
Electronics, Computers and Artificial Intelligence (ECAI), 2017, pp. 1-4, doi: 10.1109/ECAI.2017.8166452.
[82] Z. H. Salih, G. T. Hasan, M. A. Mohammed, M. A. S. Klib, A. H. Ali and R. A. Ibrahim, "Study the Effect of
Integrating the Solar Energy Source on Stability of Electrical Distribution System," 2019 22nd International
Conference on Control Systems and Computer Science (CSCS), 2019, pp. 443-447, doi:
10.1109/CSCS.2019.00081.
[83] N. D. Zaki, N. Y. Hashim, Y. M. Mohialden, M. A. Mohammed, T. Sutikno and A. H. Ali, “A real-time big data
sentiment analysis for iraqi tweets using spark streaming,” Bulletin of Electrical Engineering and Informatics
(BEEI), vol. 9, no. 4, pp. 1411–1419, 2020, doi: 10.11591/eei.v9i4.1897.
[84] M. Pradhan, “Evolutionary computational algorithm by blending of PPCA and EP-Enhanced supervised classifier
for microarray gene expression data,” IAES International Journal of Artificial Intelligence (IJAI), vol. 7, no. 2. pp.
95–104, 2018, doi: 10.11591/ijai.v7.i2.pp95-104.
[85] E. A. Gheni and Z. M. Algelal, “Human face recognition methods based on principle component analysis (PCA),
wavelet and support vector machine (SVM): a comparative study,” Indonesian Journal of Electrical Engineering
and Computer Science (IJEECS), vol. 20, no. 2. pp. 991–999, 2020, doi: 10.11591/ijeecs.v20.i2.pp991-999.
[86] P. V Kumar and K. M. Jeevan, “Face recognition with frame size reduction and DCT compression using PCA
algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 168–
178, 2021, doi: 10.11591/ijeecs.v21.i4.pp168-178.
[87] C. Darujati, S. M. Susiki Nugroho, D. Kurniawan and M. Hariadi, “Enhancing the feature-based 3D deformable
face recognition using hybrid PCA-NN,” Indonesian Journal of Electrical Engineering and Computer Science
[88] N. M. Hussien, Y. M. Mohialden, N. T. Ahmed, M. A. Mohammed and T. Sutikno, “A smart gas leakage
monitoring system for use in hospitals,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 19, no. 2, pp. 1048–1054, 2020, doi: 10.11591/ijeecs.v19.i2.pp1048-1054.

An effective classification approach for big data with parallel generalized Hebbian algorithm

More Related Content

What's hot (20)

Similar to An effective classification approach for big data with parallel generalized Hebbian algorithm (20)

More from riyaniaes (10)

Recently uploaded (20)

An effective classification approach for big data with parallel generalized Hebbian algorithm