SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 1, March 2024, pp. 451~458
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp451-458  451
Journal homepage: http://ijai.iaescore.com
Optimizer algorithms and convolutional neural networks for
text classification
Mohammed Qorich, Rajae El Ouazzani
Image Laboratory, ISNET Team, School of Technology, Moulay Ismail University of Meknes, Meknes, Morocco
Article Info ABSTRACT
Article history:
Received Dec 2, 2022
Revised Feb 3, 2023
Accepted Mar 10, 2023
Lately, deep learning has improved the algorithms and the architectures of
several natural language processing (NLP) tasks. In spite of that, the
performance of any deep learning model is widely impacted by the used
optimizer algorithm; which allows updating the model parameters, finding the
optimal weights, and minimizing the value of the loss function. Thus, this
paper proposes a new convolutional neural network (CNN) architecture for
text classification (TC) and sentiment analysis and uses it with various
optimizer algorithms in the literature. Actually, in NLP, and particularly for
sentiment classification concerns, the need for more empirical experiments
increases the probability of selecting the pertinent optimizer. Hence, we have
evaluated various optimizers on three types of text review datasets: small,
medium, and large. Thereby, we examined the optimizers regarding the data
amount and we have implemented our CNN model on three different
sentiment analysis datasets so as to binary label text reviews. The
experimental results illustrate that the adaptive optimization algorithms Adam
and root mean square propagation (RMSprop) have surpassed the other
optimizers. Moreover, our best CNN model which employed the RMSprop
optimizer has achieved 90.48% accuracy and surpassed the state-of-the-art
CNN models for binary sentiment classification problems.
Keywords:
Convolutional neural network
Deep learning
Natural language processing
Optimization algorithms
Sentiment analysis
Text classification
This is an open access article under the CC BY-SA license.
Corresponding Author:
Mohammed Qorich
Image Laboratory, ISNET Team, High School of Technology, Moulay Ismail University of Meknes
Meknes, Morocco
Email: mohamedqorich@gmail.com
1. INTRODUCTION
Recently, text classification (TC) attends a crucial interest in the natural language processing (NLP)
field in light of the upgrading in deep learning research [1]. Actually, TC is the task of extracting labels from
a given text data based on features selection [2] and it is used in numerous applications such as spam detection,
topic labeling, question answering, and sentiment analysis [1]. The latter designates the task of identifying the
polarity from reviews and opinions text either by multiple or binary classification [3]. Using deep learning
algorithms, many researchers propose different methods and architectures to highly increase the performances
in TC and sentiment analysis problems. Pal et al. [4] and Chamekh et al. [5] have adopted recurrent
neural networks (RNN) models and long short-term memory (LSTM). Meanwhile, Sachin et al. [6] and
Zulqarnain et al. [7] have employed gated recurrent units (GRU). Besides, Kim et al. [8] and Feng et al. [9]
have implemented convolutional neural network (CNN) models, while Jain et al. [10] and Rehman et al. [11]
have proposed hybrid models using CNN and RNN layers.
Despite the efforts made in these topics, these problems need more experimental aspects. In practice,
the efficiency of a deep learning model not only relies on the used architectures, layers and activation functions,
but also on the selection of the appropriate optimizers [12]. In effect, the choice of an optimizer almost stands
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458
452
on best practices, online recommendations or even on random selections, and not relies on an empirical
evidence approach due to the insufficiency of experiments [12]. Indeed, we proffer in this paper a new CNN
architecture to binary classify text reviews into positive and negative, then, we have applied multiple deep
learning optimizers in our CNN so as to determine the best and relevant optimization algorithm for a such
classification. Also, we have trained our model on three different datasets and we have examined the
performance of our model with the optimizers using the accuracy metric.
The contributions of our paper are like so: i) A new CNN architecture for a binary classification of
text reviews; ii) Our CNN model reaches a good accuracy and great performance against the state-of-the-art
models; and iii) The adaptative optimizers algorithms perform better using the new CNN in text reviews
classification compared to other optimizers.
The remain of the paper is arranged such as; i) Section 2 displays a review of some corresponded
studies that employed CNN models for TC problem; ii) In section 3, we proposed our deep learning CNN
architecture, the datasets, and some model settings plus the implemented optimizers; iii) Section 4 illustrates
the experimental results; and iv) Lastly, we conclude our paper and we present some perspectives.
2. RELATED WORK
In this section we explore some state-of-the-art studies which utilized deep learning algorithms and
CNN models for TC purposes. Actually, CNNs become a common and an efficient model architecture for TC
problems [13]. Over the years, many researchers proposed different CNN-based models aiming to extract
features from text and predict the intended labels. Kalchbrenner et al. [14] have suggested a model called
dynamic CNN (DCNN). As shows Figure 1, the DCNN involves an embedding layer that builds a sentence
matrix for every word in a given sentence. Then, the wide convolutional layers and the dynamic pooling layers
map over the sentence to produce a feature relation between the words in the sentence. In practice, the dynamic
k-max-pooling parameter takes value based on the sentence length and the position of the convolutional layer.
Figure 1. The DCNN architecture [14]
Next, Kim [15] have presented a light CNN architecture as illustrated in Figure 2, based on one
convolutional layer and filters for TC problem. Actually, the Kim’s model contains an embedding layer, a
Int J Artif Intell ISSN: 2252-8938 
Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich)
453
convolutional and a max pooling layer, followed by a fully connected layer with dropout, plus a softmax output.
Effectively, the author has used the unsupervised embedding model word2vec, and he has compared four
initialization approaches to learn the word embeddings. All the Kim’s approaches have enhanced the researches
in TC and sentiment analysis problems with CNN [13].
Figure 2. The Kim’s CNN architecture [15]
In fact, there have been many attempts to improve the Kim’s architecture. Johnson and Zhang [16]
have trained the embedding of small parts of text using an unlabeled text data, then the embeddings have been
fed to the CNN model as labeled data for TC. As well, the authors have suggested a deep pyramid CNN
(DPCNN) [17] which included a deep neural network to increase the computational complexity and also its
performance. Therefore, Liao et al. [18] have converted the input sentences into matrices, then each sentence
matrix has been represented by a word vector which forms the embeddings for the CNN architecture. The
proposed CNN was able to understand sentiments from the tweets. Afterwards, [8], [19] have improved the
architecture of CNN model by using consecutive convolutional layers and they have reached good accuracies
for sentiment classification. Besides, [20] and [21] have examined different CNN settings to find the optimal
CNN configuration and to improve the performance for TC. On the other hand, several recent studies [10],
[11], [22] have merged CNN with LSTM for TC purposes. Actually, the authors have fed the convolutional
layers of CNN with word embeddings, then the output has been appended to the LSTM layers in order to learn
long-term dependencies between words. Finally, the softmax layer takes the output from the LSTM layers and
produces the classification result.
3. RESEARCH METHOD
3.1. The proposed network
Our suggested CNN model as described in Figure 3, employed two convolutional layers and a max
pooling layer, plus two fully connected layers. Actually, we started tokenizing our train data through a
vocabulary file which that contains the most frequent words. Then, we randomly initialize the embedding layer
to extract meaningful features from the train process. In practice, the embedding layer received the input words
and produced feature values for them. Later, each word will be regrouped standing on the learned meaning.
Afterward, the two convolutional layers took the output from the embedding layer, slid a window using a kernel
size, and applied filters for every window in order to collect more features. Indeed, we appended a dropout
layer to each convolutional layer so as to ignore non optimal features. Next, the max-pooling layer selected the
maximum values from the convolutional layers and provided this output to two fully-connected layers. In
effect, we applied a dropout layer to the first fully-connected layer intending to avoid overfitting, and an
activation function rectified linear unit (ReLU) for the exponential growth in computation. Later, the second
fully-connected layer produced the vector result which involves a positive or negative classification value.
Finally, a Softmax function predicted the label result based on a probability calculation to each class. In regards
to the optimization, we applied the binary cross entropy as a loss function, then we compare a set of optimizers
with our suggested CNN to identify the best implemented models. The results of our architecture network with
several optimizer algorithms are presented in the next section.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458
454
3.2. Experiments
Our experiments were performed with Python and TensorFlow framework in Google Colab notebook
using Google compute engine backend, central processing unit (CPU) mode and 12.68 GB of memory.
Actually, we implemented our CNN model to binary classify reviews into positive and negative from three
popular datasets in TC: Amazon reviews [23], internet movie database (IMDb) movie-reviews [24], [25], and
rotten tomatoes movie-reviews data [24], [26]. In addition, we experimented a set of optimizers with our CNN
model to determine, using empirical examination, the best optimizer for TC and sentiment analysis problems.
Figure 3. The proposed CNN model for text classification
As follows, we describe the implemented optimizer algorithms:
− Gradient descent (GD) [27]: is the renowned optimization algorithm employed to perform neural
networks. GD utilizes calculus and adjusts the values consistently so as to attain the local minimum. It
computes the gradient of the cost function standing on the count of the dataset. In effect, there are three
types of GD: (1) Batch gradient descent which computes the gradient of the cost function for the complete
dataset, (2) stochastic gradient descent (SGD) which generates parameters adaptation to every training
sample, and (3) the Mini-batch gradient descent that updates the parameters for each mini-batch.
− Momentum [28]: Actually, using the SGD takes a noisy and more steeply path than GD because of
changing parameters in each training example, which means a slow computation time to reach the optimal
minimum. Hence, the momentum algorithm surpasses this problem by appending a fragment of a previous
oscillation update to the current oscillation, so the process accelerates the time steps and becomes faster.
− Adagrad [29]: is a gradient-based optimizer that adapts the learning rate standing on frequent and
infrequent parameters. The more the parameters change, the less the learning rate gets updates. Otherwise,
it generates a little update of the learning rate for frequent parameters and a large update for the infrequent
ones. Therefore, it is widely used in case of a sparse data training.
Int J Artif Intell ISSN: 2252-8938 
Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich)
455
− AdagradDA [29]: refers to adagrad dual averaging which is an adagrad-based algorithm. This optimizer
adjusts the regularization of unseen features on each mini batch. Indeed, AdagradDA is basically applied
for large sparsity in linear models.
− Adadelta [30], [31]: is an advanced algorithm of Adagrad optimizer that adjusts the decaying of the
learning rate whereby the model could learn more features. In practice, the algorithm utilizes variables to
fix the size of some accumulated gradients.
− FTRL [32]: “Follow the (Proximally) regularized leader” is a GD-based algorithm with an alternative
representation of the L1 regularization and model coefficients. The optimizer uses a per-coordinate
learning rates; besides, it has a high sparsity and convergence properties.
− Root mean squared propagation (RMSprop) [33]: is an unpublished optimizer suggested in coursera class
by Geoff Hinton [33]. The optimizer stands on an adaptive learning rate method. Similar to Adadelta,
RMSprop reduces the monotonically decreasing learning rate and accelerates the optimization. In effect,
the algorithm utilizes an average of squared gradients that decays exponentially for dividing the learning
rate.
− Adam [34]: or adaptive moment estimation is an optimizer that employs adaptive learning rates to update
every network weights parameter. Actually, Adam is an alternative extension of SGD and also inherits
features from Adagrad and RMSprop. Effectively, it requires fewer parameters tuning and lower memory
requirements. Furthermore, it is widely used to solve non-convex problems with large datasets in a faster
running time.
Next, we present some parameter values we used in our CNN model,
− For embedding layer, we employed an embedding dimension Ed=150 with a sequence length of S=500
and a maximum size of vocabulary words of vocab_size=5,000.
− Concerning convolutional layers, we set the kernels sizes to the following; k1=5, k2=3 and the number of
filters to F1=256, and F2=128.
− In connected layers, we defined the count of hidden layers as H1=128, H2=164.
For the optimization, we set a dropout to 0.5 and a learning rate with 1e-3 to initialize the whole optimizer
algorithms.
3.3. Datasets
In the current section, we proffer some information on the implemented data. Actually, we applied all
our CNN models on three text reviews datasets with different sizes. As shown in Table 1, we have classified
the datasets into large, medium, and small regarding the number of reviews. More details on datasets are given:
− Amazon reviews [23] contains 4,000,000 customers’ reviews up to March 2013 about several product
categories. Besides, the data is labeled into two classes depending on the review scores ratings from 1 to
5. ‘Positive’ is represented by 5 and 4 stars, and ‘Negative’ by 1 and 2 stars. For experiments, we
employed 100,000 reviews from the data which represents the large dataset type.
− Rotten Tomatoes (RT) reviews dataset [24], [26] includes 5331 positive snippets of text RT movie-
reviews and 5331 negatives ones. RT was first utilized in Pang/Lee ACL 2005 [26] and it is a medium
dataset in comparison with the previous one.
− IMDB reviews [24], [25] is about 2,000 sentiment text reviews regarding movies. The data contains 1,000
negative and 1000 positive sentences introduced in Pang/Lee ACL 2004 [25]. The IMDB movie-reviews
is considered as a small dataset.
Table 1. Number of reviews and types of data in the three datasets
Data Type Number of reviews
Amazon reviews [23] Large dataset 100,000
RT reviews [24] Medium dataset 10,662
IMDB reviews [24] Small dataset 2,000
Practically, we split each data-type into three sets: train, validation and test. Then, we pre-processed
our train data using several text filters in order to remove noisy contents. Afterwards, we build from each input
sentence the label and the content to train our models. The results of each model are represented and described
in the next section.
4. RESULTS AND DISCUSSION
In the current section, we describe the obtained results using different optimizer algorithms with our
CNN architecture. As shown in Table 2, we applied each one of the optimizer models on three types of datasets:
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458
456
large, medium and small to explore the impact of the data amount on the optimizer’s efficiency. In effect, we
evaluated the efficiency of the optimizer’s models by the accuracy metric.
For readability, we entitled the models using the selected optimizer for the CNN architecture. For
example, CNN-Gradient-descent represents the model that employed the Gradient descent as an optimizer in
the CNN model. As a loss function, we utilized the cross entropy in the whole models. Actually, the results
illustrate that the best CNN model for the small and medium-data is CNN-Adam. However, the CNN-RMSprop
has surpassed the CNN-Adam model in the large-dataset, despite the good accuracy reached by CNN-Adam.
Table 2. The results of the optimizer models for each dataset-type
Model Accuracy (%)
Small-dataset Medium-dataset Large-dataset
CNN-Gradient-descent 50 50.75 52.85
CNN-Momentum 50 54.03 66.33
CNN- Adagrad 51.5 51.31 50.69
CNN- AdagradDA 51 50 50
CNN-Adadelta 50 50 50.46
CNN-FTRL 50 50 50
CNN-RMSprop 60.75 71.15 90.48
CNN-Adam 70.5 74.58 90.01
In practice, we notice that the accuracy in the CNN-Gradient-descent model is poor and changed the
values by little steps, which means that the model learned slowly even if the data amount gets larger. Otherwise,
with CNN-Momentum model, the accuracy increased and the model obtained better performance. In effect, the
momentum method accelerates GD in the pertinent directions and reduces oscillations. On the other hand, the
other optimizers have attained inadequate performances and their accuracy kept a stable value in the three types
of datasets. Meanwhile, the RMS-prop model achieved a good accuracy in the large-data and overall, it
performed better than the other models in the small and medium-datasets. Actually, the RMS-prop converges
faster and requires less parameters tuning than GD algorithms and their variants. Incidentally, the CNN-Adam
displayed its advancement opposing all the other optimizers and it achieved great efficiencies regardless the
data amount. In fact, the Adam optimizer takes advantages from various optimizer algorithms and overpasses
the other optimizers in term of computation time, parameter requirements and ease of implementation.
Figures 4 and 5 show the validation accuracy of the three datasets for our best performed models;
CNN-Adam and CNN-RMSprop. Actually, we notice that the more the data get larger, the more the accuracy
increases and achieves good performance. Also, the two optimizers with our CNN architecture have made a
well progress and a good curve trace in the first 100 epochs.
Figure 4. The CNN-Adam validation accuracy Figure 5. The CNN-RMSprop validation accuracy
On the other hand, Table 3 makes a comparison between our best CNN model result and other state
of the art CNN models, using the accuracy metric. Actually, the CNN-RMSprop model which implements the
RMSprop optimizer with our CNN architecture has obtained the best efficiency with 90.48% accuracy. This
Int J Artif Intell ISSN: 2252-8938 
Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich)
457
model classified the text reviews as negative or positive from the Amazon-reviews dataset, and surpassed the
most CNN binary classification models from the state-of-the-art.
Table 3. Comparison of the results of our best CNN model and some related works models
Model Accuracy (%)
Chen’s and Wang’s CNN [22] 76.09
Kim’s and Jeong’s CNN [8] 81.06
Kim’s CNN [15] 81.5
Johnson’s and Zhang’s CNN [16] 85.7
Feng’s and Cheng’s CNN [9] 86.32
DCNN [14] 86.8
Rehman’s et al. CNN [11] 87
Jain’s et al. CNN [10] 87.1
CNN-RMSprop (our model) 90.48
5. CONCLUSION AND PERSPECTIVES
In this paper, we suggest a new CNN architecture to binary classify text reviews as negative or
positive. Our suggested model has examined a set of optimizer algorithms to evaluate empirically the best
optimizer for sentiment analysis problem. The experiments had shown that RMSprop and Adam are the most
efficient models. Moreover, we obtained great performances compared with the mentioned state of the art
architectures by reaching an accuracy of 90.48%. As a perspective, we plan to implement our model for a
multi-classification text problem and promote the architecture performances.
REFERENCES
[1] Q. Li et al., “A survey on text classification: From shallow to deep learning,” 2020, [Online]. Available:
http://arxiv.org/abs/2008.00364.
[2] A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, “Survey on text classification algorithms: From text to predictions,”
Information (Switzerland), vol. 13, no. 2, 2022, doi: 10.3390/info13020083.
[3] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: A comparative study,”
Electronics (Switzerland), vol. 9, no. 3, 2020, doi: 10.3390/electronics9030483.
[4] S. Pal, S. Ghosh, and A. Nag, “Sentiment analysis in the light of LSTM recurrent neural networks,” International Journal of
Synthetic Emotions, vol. 9, no. 1, pp. 33–39, 2018, doi: 10.4018/ijse.2018010103.
[5] A. Chamekh, M. Mahfoudh, and G. Forestier, “Sentiment analysis based on deep learning in E-commerce,” Lecture Notes in
Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13369
LNAI, pp. 498–507, 2022, doi: 10.1007/978-3-031-10986-7_40.
[6] S. Sachin, A. Tripathi, N. Mahajan, S. Aggarwal, and P. Nagrath, “Sentiment analysis using gated recurrent neural networks,” SN
Computer Science, vol. 1, no. 2, 2020, doi: 10.1007/s42979-020-0076-y.
[7] M. Zulqarnain, R. Ghazali, M. Aamir, and Y. M. M. Hassim, “An efficient two-state GRU based on feature attention mechanism
for sentiment analysis,” Multimedia Tools and Applications, 2022, doi: 10.1007/s11042-022-13339-4.
[8] H. Kim and Y. S. Jeong, “Sentiment classification using Convolutional Neural Networks,” Applied Sciences (Switzerland), vol. 9,
no. 11, 2019, doi: 10.3390/app9112347.
[9] Y. Feng and Y. Cheng, “Short text sentiment analysis based on multi-channel CNN with multi-head attention mechanism,” IEEE
Access, vol. 9, pp. 19854–19863, 2021, doi: 10.1109/ACCESS.2021.3054521.
[10] P. K. Jain, V. Saravanan, and R. Pamula, “A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using
qualitative user-generated contents,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 20,
no. 5, 2021, doi: 10.1145/3457206.
[11] A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, “A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment
analysis,” Multimedia Tools and Applications, vol. 78, no. 18, pp. 26597–26613, 2019, doi: 10.1007/s11042-019-07788-7.
[12] S. M. Zaman, M. M. Hasan, R. I. Sakline, D. Das, and M. A. Alam, “A comparative analysis of optimizers in recurrent neural
networks for text classification,” 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021,
2021, doi: 10.1109/CSDE53843.2021.9718394.
[13] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep learning based text classification: A
comprehensive review,” 2020, [Online]. Available: http://arxiv.org/abs/2004.03705.
[14] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” 52nd Annual Meeting
of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, vol. 1, pp. 655–665, 2014, doi:
10.3115/v1/p14-1062.
[15] Y. Kim, “Convolutional neural networks for sentence classification,” EMNLP 2014 - 2014 Conference on Empirical Methods in
Natural Language Processing, Proceedings of the Conference, pp. 1746–1751, 2014, doi: 10.3115/v1/d14-1181.
[16] R. Johnson and T. Zhang, “Semi-supervised convolutional neural networks for text categorization via region embedding,” Advances
in Neural Information Processing Systems, vol. 2015-January, pp. 919–927, 2015.
[17] R. Johnson and T. Zhang, “Deep pyramid convolutional neural networks for text categorization,” ACL 2017 - 55th Annual Meeting
of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), vol. 1, pp. 562–570, 2017, doi:
10.18653/v1/P17-1052.
[18] S. Liao, J. Wang, R. Yu, K. Sato, and Z. Cheng, “CNN for situations understanding based on sentiment analysis of twitter data,”
Procedia Computer Science, vol. 111, pp. 376–381, 2017, doi: 10.1016/j.procs.2017.06.037.
[19] B. M. Mulyo and D. H. Widyantoro, “Aspect-based sentiment analysis approach with CNN,” International Conference on Electrical
Engineering, Computer Science and Informatics (EECSI), vol. 2018-October, pp. 142–147, 2018, doi:
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458
458
10.1109/EECSI.2018.8752857.
[20] M. Pota, M. Esposito, G. De Pietro, and H. Fujita, “Best practices of convolutional neural networks for question classification,”
Applied Sciences (Switzerland), vol. 10, no. 14, 2020, doi: 10.3390/app10144710.
[21] M. A. Nasichuddin, T. B. Adji, and W. Widyawan, “Performance improvement using CNN for sentiment analysis,” IJITEE
(International Journal of Information Technology and Electrical Engineering), vol. 2, no. 1, 2018, doi: 10.22146/ijitee.36642.
[22] N. Chen and P. Wang, “Advanced combined LSTM-CNN model for Twitter sentiment analysis,” Proceedings of 2018 5th IEEE
International Conference on Cloud Computing and Intelligence Systems, CCIS 2018, pp. 684–687, 2019, doi:
10.1109/CCIS.2018.8691381.
[23] “Amazon 2013,” pp. 1–23, 2016, [Online]. Available: https://www.kaggle.com/bittlingmayer/amazonreviews.
[24] “IMDB 2004 / Rotten Tomatoes 2005,” pp. 1–23, 2016, [Online]. Available: https://www.cs.cornell.edu/people/pabo/movie-
review-data/.
[25] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” In
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 271–278, 2004, doi:
10.48550/arXiv.cs/0409058.
[26] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” ACL-
05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 115–124, 2004,
doi: 10.3115/1219840.1219855.
[27] N. Ketkar, “Stochastic Gradient Descent,” Deep Learning with Python, pp. 113–132, 2017.
[28] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Networks, vol. 12, no. 1, pp. 145–151, 1999,
doi: 10.1016/S0893-6080(98)00116-6.
[29] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” COLT 2010 -
The 23rd Conference on Learning Theory, pp. 257–269, 2010.
[30] M. D. Zeiler, “ADADELTA: An adaptive learning rate method,” 2012, [Online]. Available: http://arxiv.org/abs/1212.5701.
[31] Y. Wang, J. Liu, J. Misic, V. B. Misic, S. Lv, and X. Chang, “Assessing optimizer impact on DNN model sensitivity to adversarial
examples,” IEEE Access, vol. 7, pp. 152766–152776, 2019, doi: 10.1109/ACCESS.2019.2948658.
[32] H. B. Mcmahan et al., “Ad click prediction: A view from the trenches,” Proceedings of the ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, vol. Part F128815, pp. 1222–1230, 2013, doi: 10.1145/2487575.2488200.
[33] N. Shi, D. Li, M. Hong, and R. Sun, “RMS prop parameter converges with proper hyperparameter,” Int. Conf. Learn. Represent,
vol. 1, no. 2018, pp. 1–10, 2021.
[34] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning
Representations, ICLR 2015 - Conference Track Proceedings, 2015.
BIOGRAPHIES OF AUTHORS
Mohammed Qorich was born in Meknes, Morocco, in 1993. He received the
T.U.D degree in Computer Engineering by the High School of Technology of Meknes,
Moulay Ismail University (Morocco), in 2013. He received the P.L degree in IT development
by the faculty of sciences Ain Chock Casablanca, Hassan II University (Morocco), in 2014.
He received the M.S degree in educational technology at the higher normal school of Tetouan,
Abdelmalek Essaadi University (Morocco) in 2020. Currently, He is Ph.D. candidate at
Moulay Ismail University, Meknes, Morocco. His research interests include natural language
processing, deep learning, text classification, and Chatbot. He can be contacted at email:
mohamedqorich@gmail.com.
Rajae El Ouazzani received her master’s degree in Computer Science and
Telecommunication by the Mohammed V University of Rabat (Morocco) in 2006 and the
Ph.D. in Image and Video Processing by the High National School of Computer Science and
Systems Analysis (Morocco) in 2010. From 2011, she is a Professor in the High School of
Technology of Meknes, Moulay Ismail University in Morocco. Since 2007, she is an author
of several papers in international journals and conferences. Her domains of interest include
multimedia data processing and telecommunications. She can be contacted at email:
elouazzanirajae@gmail.com.
Ad

Recommended

PDF
Survey on Text Prediction Techniques
vivatechijri
 
PDF
Eat it, Review it: A New Approach for Review Prediction
vivatechijri
 
PPTX
Seminar dm
MHDAmmarALkelany
 
PDF
Icon18revrec sudeshna
Muthusamy Chelliah
 
PDF
Convolutional neural network
Yan Xu
 
PPTX
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
changedaeoh
 
PDF
Classification of Images Using CNN Model and its Variants
IRJET Journal
 
PDF
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET Journal
 
PDF
Cyber bullying detection and analysis.ppt.pdf
Hunais Abdul Nafi
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
PPTX
introduction Convolutional Neural Networks.pptx
zikoAr
 
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
PDF
Hand Written Digit Classification
ijtsrd
 
PDF
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
IJDKP
 
PDF
Text cnn on acme ugc moderation
Marsan Ma
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PPTX
Talk from NVidia Developer Connect
Anuj Gupta
 
PDF
Deep Learning in Text Recognition and Text Detection : A Review
IRJET Journal
 
PDF
IRJET - Image Classification using CNN
IRJET Journal
 
PDF
IRJET- Extension to Visual Information Narrator using Neural Network
IRJET Journal
 
PDF
A survey on the layers of convolutional Neural Network
Sasanko Sekhar Gantayat
 
PPTX
Deep Learning for Unified Personalized Search and Recommendations - Jake Mann...
Lucidworks
 
PDF
Deep Learning for Search: Personalization and Deep Tokenization
Jake Mannix
 
PDF
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
PPTX
Deep learning summary
ankit_ppt
 
PPTX
slidesgo-unlocking-the-power-of-convolutional-neural-networks-a-comprehensive...
Vadim Pinskiy
 
PDF
IRJET- Visual Information Narrator using Neural Network
IRJET Journal
 
PDF
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
PDF
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 

More Related Content

Similar to Optimizer algorithms and convolutional neural networks for text classification (20)

PDF
Cyber bullying detection and analysis.ppt.pdf
Hunais Abdul Nafi
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
PPTX
introduction Convolutional Neural Networks.pptx
zikoAr
 
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
PDF
Hand Written Digit Classification
ijtsrd
 
PDF
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
IJDKP
 
PDF
Text cnn on acme ugc moderation
Marsan Ma
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PPTX
Talk from NVidia Developer Connect
Anuj Gupta
 
PDF
Deep Learning in Text Recognition and Text Detection : A Review
IRJET Journal
 
PDF
IRJET - Image Classification using CNN
IRJET Journal
 
PDF
IRJET- Extension to Visual Information Narrator using Neural Network
IRJET Journal
 
PDF
A survey on the layers of convolutional Neural Network
Sasanko Sekhar Gantayat
 
PPTX
Deep Learning for Unified Personalized Search and Recommendations - Jake Mann...
Lucidworks
 
PDF
Deep Learning for Search: Personalization and Deep Tokenization
Jake Mannix
 
PDF
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
PPTX
Deep learning summary
ankit_ppt
 
PPTX
slidesgo-unlocking-the-power-of-convolutional-neural-networks-a-comprehensive...
Vadim Pinskiy
 
PDF
IRJET- Visual Information Narrator using Neural Network
IRJET Journal
 
Cyber bullying detection and analysis.ppt.pdf
Hunais Abdul Nafi
 
imageclassification-160206090009.pdf
KammetaJoshna
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
introduction Convolutional Neural Networks.pptx
zikoAr
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
Hand Written Digit Classification
ijtsrd
 
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
IJDKP
 
Text cnn on acme ugc moderation
Marsan Ma
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
Talk from NVidia Developer Connect
Anuj Gupta
 
Deep Learning in Text Recognition and Text Detection : A Review
IRJET Journal
 
IRJET - Image Classification using CNN
IRJET Journal
 
IRJET- Extension to Visual Information Narrator using Neural Network
IRJET Journal
 
A survey on the layers of convolutional Neural Network
Sasanko Sekhar Gantayat
 
Deep Learning for Unified Personalized Search and Recommendations - Jake Mann...
Lucidworks
 
Deep Learning for Search: Personalization and Deep Tokenization
Jake Mannix
 
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
Deep learning summary
ankit_ppt
 
slidesgo-unlocking-the-power-of-convolutional-neural-networks-a-comprehensive...
Vadim Pinskiy
 
IRJET- Visual Information Narrator using Neural Network
IRJET Journal
 

More from IAESIJAI (20)

PDF
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
PDF
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 
PDF
Obstructive sleep apnea detection based on electrocardiogram signal using one...
IAESIJAI
 
PDF
Enhancing video anomaly detection for human suspicious behavior through deep ...
IAESIJAI
 
PDF
Detecting fraudulent financial statement under imbalanced data using neural n...
IAESIJAI
 
PDF
A novel pairwise based convolutional neural network for image preprocessing e...
IAESIJAI
 
PDF
Hybrid improved fuzzy C-means and watershed segmentation to classify Alzheime...
IAESIJAI
 
PDF
Unified and evolved approach based on neural network and deep learning method...
IAESIJAI
 
PDF
Implementation of fuzzy logic approach for thalassemia screening in children
IAESIJAI
 
PDF
Enhancing interpretability in random forest: Leveraging inTrees for associati...
IAESIJAI
 
PDF
The use of augmented reality in assessing and training children with attentio...
IAESIJAI
 
PDF
Internet of things and blockchain integration for security and privacy
IAESIJAI
 
PDF
Review of cloud computing models in education and the unmet needs
IAESIJAI
 
PDF
Development of a prioritized traffic light control system for emergency vehicles
IAESIJAI
 
PDF
Optimizing the position of photovoltaic solar tracker panels with artificial ...
IAESIJAI
 
PDF
Optimisation of semantic segmentation algorithm for autonomous driving using ...
IAESIJAI
 
PDF
Design and analysis plant factory with artificial light
IAESIJAI
 
PDF
Navigating the tech-savvy generation; key considerations in developing of an ...
IAESIJAI
 
PDF
Enhancing legal research through knowledge-infused information retrieval for...
IAESIJAI
 
PDF
Anartificial intelligence approach to smart exam supervision using YOLO v5 an...
IAESIJAI
 
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 
Obstructive sleep apnea detection based on electrocardiogram signal using one...
IAESIJAI
 
Enhancing video anomaly detection for human suspicious behavior through deep ...
IAESIJAI
 
Detecting fraudulent financial statement under imbalanced data using neural n...
IAESIJAI
 
A novel pairwise based convolutional neural network for image preprocessing e...
IAESIJAI
 
Hybrid improved fuzzy C-means and watershed segmentation to classify Alzheime...
IAESIJAI
 
Unified and evolved approach based on neural network and deep learning method...
IAESIJAI
 
Implementation of fuzzy logic approach for thalassemia screening in children
IAESIJAI
 
Enhancing interpretability in random forest: Leveraging inTrees for associati...
IAESIJAI
 
The use of augmented reality in assessing and training children with attentio...
IAESIJAI
 
Internet of things and blockchain integration for security and privacy
IAESIJAI
 
Review of cloud computing models in education and the unmet needs
IAESIJAI
 
Development of a prioritized traffic light control system for emergency vehicles
IAESIJAI
 
Optimizing the position of photovoltaic solar tracker panels with artificial ...
IAESIJAI
 
Optimisation of semantic segmentation algorithm for autonomous driving using ...
IAESIJAI
 
Design and analysis plant factory with artificial light
IAESIJAI
 
Navigating the tech-savvy generation; key considerations in developing of an ...
IAESIJAI
 
Enhancing legal research through knowledge-infused information retrieval for...
IAESIJAI
 
Anartificial intelligence approach to smart exam supervision using YOLO v5 an...
IAESIJAI
 
Ad

Recently uploaded (20)

PDF
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
PPTX
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
PDF
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
PDF
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
PDF
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Ad

Optimizer algorithms and convolutional neural networks for text classification

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 1, March 2024, pp. 451~458 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp451-458  451 Journal homepage: http://ijai.iaescore.com Optimizer algorithms and convolutional neural networks for text classification Mohammed Qorich, Rajae El Ouazzani Image Laboratory, ISNET Team, School of Technology, Moulay Ismail University of Meknes, Meknes, Morocco Article Info ABSTRACT Article history: Received Dec 2, 2022 Revised Feb 3, 2023 Accepted Mar 10, 2023 Lately, deep learning has improved the algorithms and the architectures of several natural language processing (NLP) tasks. In spite of that, the performance of any deep learning model is widely impacted by the used optimizer algorithm; which allows updating the model parameters, finding the optimal weights, and minimizing the value of the loss function. Thus, this paper proposes a new convolutional neural network (CNN) architecture for text classification (TC) and sentiment analysis and uses it with various optimizer algorithms in the literature. Actually, in NLP, and particularly for sentiment classification concerns, the need for more empirical experiments increases the probability of selecting the pertinent optimizer. Hence, we have evaluated various optimizers on three types of text review datasets: small, medium, and large. Thereby, we examined the optimizers regarding the data amount and we have implemented our CNN model on three different sentiment analysis datasets so as to binary label text reviews. The experimental results illustrate that the adaptive optimization algorithms Adam and root mean square propagation (RMSprop) have surpassed the other optimizers. Moreover, our best CNN model which employed the RMSprop optimizer has achieved 90.48% accuracy and surpassed the state-of-the-art CNN models for binary sentiment classification problems. Keywords: Convolutional neural network Deep learning Natural language processing Optimization algorithms Sentiment analysis Text classification This is an open access article under the CC BY-SA license. Corresponding Author: Mohammed Qorich Image Laboratory, ISNET Team, High School of Technology, Moulay Ismail University of Meknes Meknes, Morocco Email: [email protected] 1. INTRODUCTION Recently, text classification (TC) attends a crucial interest in the natural language processing (NLP) field in light of the upgrading in deep learning research [1]. Actually, TC is the task of extracting labels from a given text data based on features selection [2] and it is used in numerous applications such as spam detection, topic labeling, question answering, and sentiment analysis [1]. The latter designates the task of identifying the polarity from reviews and opinions text either by multiple or binary classification [3]. Using deep learning algorithms, many researchers propose different methods and architectures to highly increase the performances in TC and sentiment analysis problems. Pal et al. [4] and Chamekh et al. [5] have adopted recurrent neural networks (RNN) models and long short-term memory (LSTM). Meanwhile, Sachin et al. [6] and Zulqarnain et al. [7] have employed gated recurrent units (GRU). Besides, Kim et al. [8] and Feng et al. [9] have implemented convolutional neural network (CNN) models, while Jain et al. [10] and Rehman et al. [11] have proposed hybrid models using CNN and RNN layers. Despite the efforts made in these topics, these problems need more experimental aspects. In practice, the efficiency of a deep learning model not only relies on the used architectures, layers and activation functions, but also on the selection of the appropriate optimizers [12]. In effect, the choice of an optimizer almost stands
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458 452 on best practices, online recommendations or even on random selections, and not relies on an empirical evidence approach due to the insufficiency of experiments [12]. Indeed, we proffer in this paper a new CNN architecture to binary classify text reviews into positive and negative, then, we have applied multiple deep learning optimizers in our CNN so as to determine the best and relevant optimization algorithm for a such classification. Also, we have trained our model on three different datasets and we have examined the performance of our model with the optimizers using the accuracy metric. The contributions of our paper are like so: i) A new CNN architecture for a binary classification of text reviews; ii) Our CNN model reaches a good accuracy and great performance against the state-of-the-art models; and iii) The adaptative optimizers algorithms perform better using the new CNN in text reviews classification compared to other optimizers. The remain of the paper is arranged such as; i) Section 2 displays a review of some corresponded studies that employed CNN models for TC problem; ii) In section 3, we proposed our deep learning CNN architecture, the datasets, and some model settings plus the implemented optimizers; iii) Section 4 illustrates the experimental results; and iv) Lastly, we conclude our paper and we present some perspectives. 2. RELATED WORK In this section we explore some state-of-the-art studies which utilized deep learning algorithms and CNN models for TC purposes. Actually, CNNs become a common and an efficient model architecture for TC problems [13]. Over the years, many researchers proposed different CNN-based models aiming to extract features from text and predict the intended labels. Kalchbrenner et al. [14] have suggested a model called dynamic CNN (DCNN). As shows Figure 1, the DCNN involves an embedding layer that builds a sentence matrix for every word in a given sentence. Then, the wide convolutional layers and the dynamic pooling layers map over the sentence to produce a feature relation between the words in the sentence. In practice, the dynamic k-max-pooling parameter takes value based on the sentence length and the position of the convolutional layer. Figure 1. The DCNN architecture [14] Next, Kim [15] have presented a light CNN architecture as illustrated in Figure 2, based on one convolutional layer and filters for TC problem. Actually, the Kim’s model contains an embedding layer, a
  • 3. Int J Artif Intell ISSN: 2252-8938  Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich) 453 convolutional and a max pooling layer, followed by a fully connected layer with dropout, plus a softmax output. Effectively, the author has used the unsupervised embedding model word2vec, and he has compared four initialization approaches to learn the word embeddings. All the Kim’s approaches have enhanced the researches in TC and sentiment analysis problems with CNN [13]. Figure 2. The Kim’s CNN architecture [15] In fact, there have been many attempts to improve the Kim’s architecture. Johnson and Zhang [16] have trained the embedding of small parts of text using an unlabeled text data, then the embeddings have been fed to the CNN model as labeled data for TC. As well, the authors have suggested a deep pyramid CNN (DPCNN) [17] which included a deep neural network to increase the computational complexity and also its performance. Therefore, Liao et al. [18] have converted the input sentences into matrices, then each sentence matrix has been represented by a word vector which forms the embeddings for the CNN architecture. The proposed CNN was able to understand sentiments from the tweets. Afterwards, [8], [19] have improved the architecture of CNN model by using consecutive convolutional layers and they have reached good accuracies for sentiment classification. Besides, [20] and [21] have examined different CNN settings to find the optimal CNN configuration and to improve the performance for TC. On the other hand, several recent studies [10], [11], [22] have merged CNN with LSTM for TC purposes. Actually, the authors have fed the convolutional layers of CNN with word embeddings, then the output has been appended to the LSTM layers in order to learn long-term dependencies between words. Finally, the softmax layer takes the output from the LSTM layers and produces the classification result. 3. RESEARCH METHOD 3.1. The proposed network Our suggested CNN model as described in Figure 3, employed two convolutional layers and a max pooling layer, plus two fully connected layers. Actually, we started tokenizing our train data through a vocabulary file which that contains the most frequent words. Then, we randomly initialize the embedding layer to extract meaningful features from the train process. In practice, the embedding layer received the input words and produced feature values for them. Later, each word will be regrouped standing on the learned meaning. Afterward, the two convolutional layers took the output from the embedding layer, slid a window using a kernel size, and applied filters for every window in order to collect more features. Indeed, we appended a dropout layer to each convolutional layer so as to ignore non optimal features. Next, the max-pooling layer selected the maximum values from the convolutional layers and provided this output to two fully-connected layers. In effect, we applied a dropout layer to the first fully-connected layer intending to avoid overfitting, and an activation function rectified linear unit (ReLU) for the exponential growth in computation. Later, the second fully-connected layer produced the vector result which involves a positive or negative classification value. Finally, a Softmax function predicted the label result based on a probability calculation to each class. In regards to the optimization, we applied the binary cross entropy as a loss function, then we compare a set of optimizers with our suggested CNN to identify the best implemented models. The results of our architecture network with several optimizer algorithms are presented in the next section.
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458 454 3.2. Experiments Our experiments were performed with Python and TensorFlow framework in Google Colab notebook using Google compute engine backend, central processing unit (CPU) mode and 12.68 GB of memory. Actually, we implemented our CNN model to binary classify reviews into positive and negative from three popular datasets in TC: Amazon reviews [23], internet movie database (IMDb) movie-reviews [24], [25], and rotten tomatoes movie-reviews data [24], [26]. In addition, we experimented a set of optimizers with our CNN model to determine, using empirical examination, the best optimizer for TC and sentiment analysis problems. Figure 3. The proposed CNN model for text classification As follows, we describe the implemented optimizer algorithms: − Gradient descent (GD) [27]: is the renowned optimization algorithm employed to perform neural networks. GD utilizes calculus and adjusts the values consistently so as to attain the local minimum. It computes the gradient of the cost function standing on the count of the dataset. In effect, there are three types of GD: (1) Batch gradient descent which computes the gradient of the cost function for the complete dataset, (2) stochastic gradient descent (SGD) which generates parameters adaptation to every training sample, and (3) the Mini-batch gradient descent that updates the parameters for each mini-batch. − Momentum [28]: Actually, using the SGD takes a noisy and more steeply path than GD because of changing parameters in each training example, which means a slow computation time to reach the optimal minimum. Hence, the momentum algorithm surpasses this problem by appending a fragment of a previous oscillation update to the current oscillation, so the process accelerates the time steps and becomes faster. − Adagrad [29]: is a gradient-based optimizer that adapts the learning rate standing on frequent and infrequent parameters. The more the parameters change, the less the learning rate gets updates. Otherwise, it generates a little update of the learning rate for frequent parameters and a large update for the infrequent ones. Therefore, it is widely used in case of a sparse data training.
  • 5. Int J Artif Intell ISSN: 2252-8938  Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich) 455 − AdagradDA [29]: refers to adagrad dual averaging which is an adagrad-based algorithm. This optimizer adjusts the regularization of unseen features on each mini batch. Indeed, AdagradDA is basically applied for large sparsity in linear models. − Adadelta [30], [31]: is an advanced algorithm of Adagrad optimizer that adjusts the decaying of the learning rate whereby the model could learn more features. In practice, the algorithm utilizes variables to fix the size of some accumulated gradients. − FTRL [32]: “Follow the (Proximally) regularized leader” is a GD-based algorithm with an alternative representation of the L1 regularization and model coefficients. The optimizer uses a per-coordinate learning rates; besides, it has a high sparsity and convergence properties. − Root mean squared propagation (RMSprop) [33]: is an unpublished optimizer suggested in coursera class by Geoff Hinton [33]. The optimizer stands on an adaptive learning rate method. Similar to Adadelta, RMSprop reduces the monotonically decreasing learning rate and accelerates the optimization. In effect, the algorithm utilizes an average of squared gradients that decays exponentially for dividing the learning rate. − Adam [34]: or adaptive moment estimation is an optimizer that employs adaptive learning rates to update every network weights parameter. Actually, Adam is an alternative extension of SGD and also inherits features from Adagrad and RMSprop. Effectively, it requires fewer parameters tuning and lower memory requirements. Furthermore, it is widely used to solve non-convex problems with large datasets in a faster running time. Next, we present some parameter values we used in our CNN model, − For embedding layer, we employed an embedding dimension Ed=150 with a sequence length of S=500 and a maximum size of vocabulary words of vocab_size=5,000. − Concerning convolutional layers, we set the kernels sizes to the following; k1=5, k2=3 and the number of filters to F1=256, and F2=128. − In connected layers, we defined the count of hidden layers as H1=128, H2=164. For the optimization, we set a dropout to 0.5 and a learning rate with 1e-3 to initialize the whole optimizer algorithms. 3.3. Datasets In the current section, we proffer some information on the implemented data. Actually, we applied all our CNN models on three text reviews datasets with different sizes. As shown in Table 1, we have classified the datasets into large, medium, and small regarding the number of reviews. More details on datasets are given: − Amazon reviews [23] contains 4,000,000 customers’ reviews up to March 2013 about several product categories. Besides, the data is labeled into two classes depending on the review scores ratings from 1 to 5. ‘Positive’ is represented by 5 and 4 stars, and ‘Negative’ by 1 and 2 stars. For experiments, we employed 100,000 reviews from the data which represents the large dataset type. − Rotten Tomatoes (RT) reviews dataset [24], [26] includes 5331 positive snippets of text RT movie- reviews and 5331 negatives ones. RT was first utilized in Pang/Lee ACL 2005 [26] and it is a medium dataset in comparison with the previous one. − IMDB reviews [24], [25] is about 2,000 sentiment text reviews regarding movies. The data contains 1,000 negative and 1000 positive sentences introduced in Pang/Lee ACL 2004 [25]. The IMDB movie-reviews is considered as a small dataset. Table 1. Number of reviews and types of data in the three datasets Data Type Number of reviews Amazon reviews [23] Large dataset 100,000 RT reviews [24] Medium dataset 10,662 IMDB reviews [24] Small dataset 2,000 Practically, we split each data-type into three sets: train, validation and test. Then, we pre-processed our train data using several text filters in order to remove noisy contents. Afterwards, we build from each input sentence the label and the content to train our models. The results of each model are represented and described in the next section. 4. RESULTS AND DISCUSSION In the current section, we describe the obtained results using different optimizer algorithms with our CNN architecture. As shown in Table 2, we applied each one of the optimizer models on three types of datasets:
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458 456 large, medium and small to explore the impact of the data amount on the optimizer’s efficiency. In effect, we evaluated the efficiency of the optimizer’s models by the accuracy metric. For readability, we entitled the models using the selected optimizer for the CNN architecture. For example, CNN-Gradient-descent represents the model that employed the Gradient descent as an optimizer in the CNN model. As a loss function, we utilized the cross entropy in the whole models. Actually, the results illustrate that the best CNN model for the small and medium-data is CNN-Adam. However, the CNN-RMSprop has surpassed the CNN-Adam model in the large-dataset, despite the good accuracy reached by CNN-Adam. Table 2. The results of the optimizer models for each dataset-type Model Accuracy (%) Small-dataset Medium-dataset Large-dataset CNN-Gradient-descent 50 50.75 52.85 CNN-Momentum 50 54.03 66.33 CNN- Adagrad 51.5 51.31 50.69 CNN- AdagradDA 51 50 50 CNN-Adadelta 50 50 50.46 CNN-FTRL 50 50 50 CNN-RMSprop 60.75 71.15 90.48 CNN-Adam 70.5 74.58 90.01 In practice, we notice that the accuracy in the CNN-Gradient-descent model is poor and changed the values by little steps, which means that the model learned slowly even if the data amount gets larger. Otherwise, with CNN-Momentum model, the accuracy increased and the model obtained better performance. In effect, the momentum method accelerates GD in the pertinent directions and reduces oscillations. On the other hand, the other optimizers have attained inadequate performances and their accuracy kept a stable value in the three types of datasets. Meanwhile, the RMS-prop model achieved a good accuracy in the large-data and overall, it performed better than the other models in the small and medium-datasets. Actually, the RMS-prop converges faster and requires less parameters tuning than GD algorithms and their variants. Incidentally, the CNN-Adam displayed its advancement opposing all the other optimizers and it achieved great efficiencies regardless the data amount. In fact, the Adam optimizer takes advantages from various optimizer algorithms and overpasses the other optimizers in term of computation time, parameter requirements and ease of implementation. Figures 4 and 5 show the validation accuracy of the three datasets for our best performed models; CNN-Adam and CNN-RMSprop. Actually, we notice that the more the data get larger, the more the accuracy increases and achieves good performance. Also, the two optimizers with our CNN architecture have made a well progress and a good curve trace in the first 100 epochs. Figure 4. The CNN-Adam validation accuracy Figure 5. The CNN-RMSprop validation accuracy On the other hand, Table 3 makes a comparison between our best CNN model result and other state of the art CNN models, using the accuracy metric. Actually, the CNN-RMSprop model which implements the RMSprop optimizer with our CNN architecture has obtained the best efficiency with 90.48% accuracy. This
  • 7. Int J Artif Intell ISSN: 2252-8938  Optimizer algorithms and convolutional neural networks for text classification (Mohammed Qorich) 457 model classified the text reviews as negative or positive from the Amazon-reviews dataset, and surpassed the most CNN binary classification models from the state-of-the-art. Table 3. Comparison of the results of our best CNN model and some related works models Model Accuracy (%) Chen’s and Wang’s CNN [22] 76.09 Kim’s and Jeong’s CNN [8] 81.06 Kim’s CNN [15] 81.5 Johnson’s and Zhang’s CNN [16] 85.7 Feng’s and Cheng’s CNN [9] 86.32 DCNN [14] 86.8 Rehman’s et al. CNN [11] 87 Jain’s et al. CNN [10] 87.1 CNN-RMSprop (our model) 90.48 5. CONCLUSION AND PERSPECTIVES In this paper, we suggest a new CNN architecture to binary classify text reviews as negative or positive. Our suggested model has examined a set of optimizer algorithms to evaluate empirically the best optimizer for sentiment analysis problem. The experiments had shown that RMSprop and Adam are the most efficient models. Moreover, we obtained great performances compared with the mentioned state of the art architectures by reaching an accuracy of 90.48%. As a perspective, we plan to implement our model for a multi-classification text problem and promote the architecture performances. REFERENCES [1] Q. Li et al., “A survey on text classification: From shallow to deep learning,” 2020, [Online]. Available: http://arxiv.org/abs/2008.00364. [2] A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, “Survey on text classification algorithms: From text to predictions,” Information (Switzerland), vol. 13, no. 2, 2022, doi: 10.3390/info13020083. [3] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: A comparative study,” Electronics (Switzerland), vol. 9, no. 3, 2020, doi: 10.3390/electronics9030483. [4] S. Pal, S. Ghosh, and A. Nag, “Sentiment analysis in the light of LSTM recurrent neural networks,” International Journal of Synthetic Emotions, vol. 9, no. 1, pp. 33–39, 2018, doi: 10.4018/ijse.2018010103. [5] A. Chamekh, M. Mahfoudh, and G. Forestier, “Sentiment analysis based on deep learning in E-commerce,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13369 LNAI, pp. 498–507, 2022, doi: 10.1007/978-3-031-10986-7_40. [6] S. Sachin, A. Tripathi, N. Mahajan, S. Aggarwal, and P. Nagrath, “Sentiment analysis using gated recurrent neural networks,” SN Computer Science, vol. 1, no. 2, 2020, doi: 10.1007/s42979-020-0076-y. [7] M. Zulqarnain, R. Ghazali, M. Aamir, and Y. M. M. Hassim, “An efficient two-state GRU based on feature attention mechanism for sentiment analysis,” Multimedia Tools and Applications, 2022, doi: 10.1007/s11042-022-13339-4. [8] H. Kim and Y. S. Jeong, “Sentiment classification using Convolutional Neural Networks,” Applied Sciences (Switzerland), vol. 9, no. 11, 2019, doi: 10.3390/app9112347. [9] Y. Feng and Y. Cheng, “Short text sentiment analysis based on multi-channel CNN with multi-head attention mechanism,” IEEE Access, vol. 9, pp. 19854–19863, 2021, doi: 10.1109/ACCESS.2021.3054521. [10] P. K. Jain, V. Saravanan, and R. Pamula, “A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 20, no. 5, 2021, doi: 10.1145/3457206. [11] A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, “A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis,” Multimedia Tools and Applications, vol. 78, no. 18, pp. 26597–26613, 2019, doi: 10.1007/s11042-019-07788-7. [12] S. M. Zaman, M. M. Hasan, R. I. Sakline, D. Das, and M. A. Alam, “A comparative analysis of optimizers in recurrent neural networks for text classification,” 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021, 2021, doi: 10.1109/CSDE53843.2021.9718394. [13] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep learning based text classification: A comprehensive review,” 2020, [Online]. Available: http://arxiv.org/abs/2004.03705. [14] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference, vol. 1, pp. 655–665, 2014, doi: 10.3115/v1/p14-1062. [15] Y. Kim, “Convolutional neural networks for sentence classification,” EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751, 2014, doi: 10.3115/v1/d14-1181. [16] R. Johnson and T. Zhang, “Semi-supervised convolutional neural networks for text categorization via region embedding,” Advances in Neural Information Processing Systems, vol. 2015-January, pp. 919–927, 2015. [17] R. Johnson and T. Zhang, “Deep pyramid convolutional neural networks for text categorization,” ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), vol. 1, pp. 562–570, 2017, doi: 10.18653/v1/P17-1052. [18] S. Liao, J. Wang, R. Yu, K. Sato, and Z. Cheng, “CNN for situations understanding based on sentiment analysis of twitter data,” Procedia Computer Science, vol. 111, pp. 376–381, 2017, doi: 10.1016/j.procs.2017.06.037. [19] B. M. Mulyo and D. H. Widyantoro, “Aspect-based sentiment analysis approach with CNN,” International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), vol. 2018-October, pp. 142–147, 2018, doi:
  • 8.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 451-458 458 10.1109/EECSI.2018.8752857. [20] M. Pota, M. Esposito, G. De Pietro, and H. Fujita, “Best practices of convolutional neural networks for question classification,” Applied Sciences (Switzerland), vol. 10, no. 14, 2020, doi: 10.3390/app10144710. [21] M. A. Nasichuddin, T. B. Adji, and W. Widyawan, “Performance improvement using CNN for sentiment analysis,” IJITEE (International Journal of Information Technology and Electrical Engineering), vol. 2, no. 1, 2018, doi: 10.22146/ijitee.36642. [22] N. Chen and P. Wang, “Advanced combined LSTM-CNN model for Twitter sentiment analysis,” Proceedings of 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2018, pp. 684–687, 2019, doi: 10.1109/CCIS.2018.8691381. [23] “Amazon 2013,” pp. 1–23, 2016, [Online]. Available: https://www.kaggle.com/bittlingmayer/amazonreviews. [24] “IMDB 2004 / Rotten Tomatoes 2005,” pp. 1–23, 2016, [Online]. Available: https://www.cs.cornell.edu/people/pabo/movie- review-data/. [25] B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 271–278, 2004, doi: 10.48550/arXiv.cs/0409058. [26] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” ACL- 05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 115–124, 2004, doi: 10.3115/1219840.1219855. [27] N. Ketkar, “Stochastic Gradient Descent,” Deep Learning with Python, pp. 113–132, 2017. [28] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Networks, vol. 12, no. 1, pp. 145–151, 1999, doi: 10.1016/S0893-6080(98)00116-6. [29] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” COLT 2010 - The 23rd Conference on Learning Theory, pp. 257–269, 2010. [30] M. D. Zeiler, “ADADELTA: An adaptive learning rate method,” 2012, [Online]. Available: http://arxiv.org/abs/1212.5701. [31] Y. Wang, J. Liu, J. Misic, V. B. Misic, S. Lv, and X. Chang, “Assessing optimizer impact on DNN model sensitivity to adversarial examples,” IEEE Access, vol. 7, pp. 152766–152776, 2019, doi: 10.1109/ACCESS.2019.2948658. [32] H. B. Mcmahan et al., “Ad click prediction: A view from the trenches,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. Part F128815, pp. 1222–1230, 2013, doi: 10.1145/2487575.2488200. [33] N. Shi, D. Li, M. Hong, and R. Sun, “RMS prop parameter converges with proper hyperparameter,” Int. Conf. Learn. Represent, vol. 1, no. 2018, pp. 1–10, 2021. [34] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015. BIOGRAPHIES OF AUTHORS Mohammed Qorich was born in Meknes, Morocco, in 1993. He received the T.U.D degree in Computer Engineering by the High School of Technology of Meknes, Moulay Ismail University (Morocco), in 2013. He received the P.L degree in IT development by the faculty of sciences Ain Chock Casablanca, Hassan II University (Morocco), in 2014. He received the M.S degree in educational technology at the higher normal school of Tetouan, Abdelmalek Essaadi University (Morocco) in 2020. Currently, He is Ph.D. candidate at Moulay Ismail University, Meknes, Morocco. His research interests include natural language processing, deep learning, text classification, and Chatbot. He can be contacted at email: [email protected]. Rajae El Ouazzani received her master’s degree in Computer Science and Telecommunication by the Mohammed V University of Rabat (Morocco) in 2006 and the Ph.D. in Image and Video Processing by the High National School of Computer Science and Systems Analysis (Morocco) in 2010. From 2011, she is a Professor in the High School of Technology of Meknes, Moulay Ismail University in Morocco. Since 2007, she is an author of several papers in international journals and conferences. Her domains of interest include multimedia data processing and telecommunications. She can be contacted at email: [email protected].