SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 4, December 2024, pp. 4241~4248
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i4.pp4241-4248  4241
Journal homepage: http://ijai.iaescore.com
Image analysis for classifying coffee bean quality using a multi-
feature and machine learning approach
Anindita Septiarini1
, Hamdani Hamdani1
, Aji Ery Burhandeny2
, Damar Nurcahyono3
,
Surya Eka Priyatna4
1
Department of Informatics, Faculty of Engineering, Mulawarman University, Samarinda, Indonesia
2
Department of Electronic Engineering, Faculty of Engineering, Mulawarman University, Samarinda, Indonesia
3
Department of Information Technology, Politeknik Negeri Samarinda, Samarinda, Indonesia
4
Department of Information Technology, Faculty of Da'wah and Communication Sciences, Antasari State Islamic University,
Banjarmasin, Indonesia
Article Info ABSTRACT
Article history:
Received Nov 27, 2023
Revised Feb 11, 2024
Accepted Feb 28, 2024
Price and customer satisfaction depend on coffee bean quality. The coffee
industry must analyze coffee bean quality. Global demand for robusta coffee
is high. Coffee industry professionals mostly understand coffee bean quality.
Thus, an image analysis using a computer vision-based approach for
classifying robusta coffee bean quality is required. Image acquisition, region
of interest (ROI) detection, pre-processing, segmentation, feature extraction,
feature selection, and classification are covered in this study. A multi-feature
derived based on color, shape, and texture features was employed in feature
extraction, followed by feature selection using principal component analysis
(PCA). Several machine-learning methods classified the coffee beans. The
method performance was assessed using precision, recall, and accuracy. The
selected features using the backpropagation neural network (BPNN)
classifier outperformed others with 98.54% accuracy.
Keywords:
Coffee beans
Features selection
K-means
Machine learning
Principal component analysis
This is an open access article under the CC BY-SA license.
Corresponding Author:
Anindita Septiarini
Department of Informatics, Faculty of Engineering, Mulawarman University
St. Sambaliung, No. 9, Samarinda, Indonesia
Email: anindita@unmul.ac.id
1. INTRODUCTION
The utilization of computers and associated technologies is seeing fast expansion and
diversification. The application of this is being observed in the field of agriculture. There exist multiple
instances wherein computers have been employed in the agricultural sector, encompassing the monitoring of
fruit ripeness [1], [2], land management [3], and plant development [4], [5]. Coffee, as one of the most
widely consumed beverages globally, holds significant importance as an economic commodity. The global
popularity of coffee can be attributed to its stimulating properties and the preference for its bitter flavor.
Coffee serves as a substantial provider of caffeine for a considerable number of individuals. While previous
research has established a connection between coffee and caffeine intake and adverse health effects, recent
studies have presented evidence suggesting that the compounds found in coffee, such as caffeine, chlorogenic
acids, kahweol, cafestol, and various micronutrients (such as magnesium, potassium, and phosphorus), may
enhance the immune system and provide protection against the development of conditions such as obesity,
diabetes, neurological diseases, osteoporosis, and pancreatic cancer [6].
The coffee industry values quality because of the relationship between coffee bean scarcity,
monetary compensation, and consumer happiness. Robusta coffee beans, widely grown, have a distinct taste
and aroma. Quality of robusta coffee beans depends on soil makeup, climate, and processing method. Coffee
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248
4242
prices depend on bean quality. It is crucial to note that not all growers and coffee shop owners can identify
coffee bean quality. Thus, errors may occur when they lack this expertise. Grading is time-consuming and
produces inconsistent outcomes. Due to visual perception limits, fatigue, and coffee quality evaluation
differences, these inconsistencies occur. Visual characteristics are often used to evaluate robusta coffee
beans. In this situation, computer vision may work. It extracts robusta coffee bean visual traits that highly
predict quality. Color, form, and texture may be needed for this procedure.
Numerous research investigations have been conducted in computer vision, focusing on the application
of food processing. These studies encompass a range of food items, such as banana [7], honey [8], date fruit [9],
palm oil [10], and coffee [11], [12]. The construction of this system involves several general processes, namely
pre-processing, segmentation, feature extraction, and classification [13]. Common pre-processing tasks often
involve scaling [14] and converting color spaces [15]. The Otsu thresholding method [13], K-means clustering
algorithm [16], and edge detection approach [17] were subsequently employed, along with many established
segmentation methodologies. The extractable features that can be considered for edibles encompass color [10],
shape [18], and texture [11]. Moreover, naïve Bayes (NB) [10], k-nearest neighbor (KNN) [19], and support
vector machines (SVM) [10] are frequently utilized in the classification process.
Recent studies have used machine learning to classify coffee beans across agricultural situations.
Color and shape helped identify high-quality beans. The investigation used image processing and machine
learning on an Arduino mega board. Essential criteria were assessed to determine high-quality green coffee
beans. KNN was used to evaluate coffee beans and classify them by defect type. Logic, image processing,
and supervised learning algorithms are executed and coded on the Arduino board. The machine vision system
has an average accuracy of 94.79% for quality and 95.78% for defect-type evaluation. However, long berry
bean classification was 98.05% accurate [20]. Subsequently, a variety of machine learning methodologies
such as SVM, deep neural networks (DNN), and random forest (RF) were utilized to evaluate the
significance of shape and color characteristics in the assessment of faults in coffee beans. The data presented
in the study highlights the significance of color descriptors in the classification of faults in coffee beans. The
classification models consider the most significant features obtained from the average G value of the
component in the RGB color space and the average V value in the HSV color space. All the classifier models
exhibited comparable performance, with the best accuracy value above 88% [12].
Several efforts were presented in order to identify and categorize coffee fruits, as well as to map the
stage of maturation of these fruits during the harvest process. The methodology was executed utilizing the
Darknet framework. The YOLOv3-tiny object identification system identified and categorized coffee fruit.
The collection contains 90 videos from the 2020 arabica coffee (Catuaí 144) harvest, shot at a coffee
harvester's discharge conveyor termination point. A business area in Patos de Minas, Minas Gerais, Brazil
hosted the recordings. The model performed best at around 3300th iteration with an 800×800-pixel image
input. The model had 84% mean average precision (mAP), 82% F1-score, 83% precision, and 82% recall in
the validation set. The precision values for unripe, ripe, and overripe coffee fruits were 86%, 85%, and 80%,
respectively [21]. Another study used a convolutional network on an inexpensive micro-controller board to
classify coffee leaf diseases locally without the internet. Early diagnosis of coffee plant diseases was crucial
for optimal output and production quality. Two datasets and development board images were used in this
investigation. The collection included around 6000 images from six sickness classes. The incorporated
cascade and single-stage systems were 98% and 96% accurate, respectively. These findings imply that these
structures detect coffee plantation diseases [22].
This study presents a proposed method for classifying coffee bean quality based on computer vision
techniques. The method utilizes color, shape, and texture data extracted from the RGB, HSV, and L*a*b
color spaces. The BP was employed as the classifier in this work. The objective of this method was to
ascertain the classification of coffee beans according to their quality by utilizing image data. The quality
types were classified into four classes: intact, perforated, wrinkled, and cracked.
2. MATERIALS AND METHODS
The approach predicted the quality class of all robusta coffee bean photos. Its main processes were
region of interest (ROI) detection, pre-processing, feature extraction, selection, and classification. The
method has two phases: training and testing. Training and testing sets provided input for each phase. Both
phases were handled differently. ROI detection assigned the coffee bean area to the image using K-means
clustering. The training step pre-processed RGB data into grayscale, HSV, and L*a*b. Afterward, color,
texture, and shape were used to extract features. Subsequently, the feature selection procedure was used to
choose the most significant features and simplify classification. Pre-processing merely converted RGB to the
feature selection color space during testing. Principal component analysis (PCA) generated the selected
features in the proposed technique. The feature selection result was used to apply the extracted feature. A
Int J Artif Intell ISSN: 2252-8938 
Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini)
4243
prediction class (intact/perforated/wrinkled/cracked) was determined from selected features in the final step.
Figure 1 illustrates the robusta coffee bean quality classification.
Figure 1. Overview of all steps in the proposed method for quality classification of robusta coffee bean
2.1. Dataset
The dataset in this study was images of robusta coffee beans. JPEG images were taken with a Xiaomi
5A smartphone's inbuilt camera. The coffee bean was placed on a white background in the center of a 28×19×18
cm studio minibox. A 10 cm gap between the camera and the coffee beans was maintained by deliberately
positioning and orienting the camera. Smartphone cameras are 13-megapixel. The image has dimensions of
1560×1560 pixels. The dataset had 1440 coffee bean images, 360 each class. It was divided into four classes:
intact, perforated, wrinkled, or cracked, with the example image shown in Figures 2(a) to 2(d), respectively.
(a) (b) (c) (d)
Figure 2. Examples of coffee bean images with various quality types: (a) intact, (b) perforated, (c) wrinkled,
and (d) cracked
2.2. Region of interest detection
ROI detection attempts to generate a sub-image mostly of the coffee bean area. During this stage,
the initial resolution of the image was reduced from 1560×1560 pixels to 500×500 pixels [3] to minimize the
computational time. Subsequently, a color space conversion was performed from RGB to L*a*b; this enabled
the system to accurately differentiate between object and background regions in various scenarios. The
utilization of L*a*b color spaces necessitates a conversion procedure that relies on the values within the RGB
color space, which are explicitly defined as in [23].
The result of the conversion of an original image in RGB Figure 3(a) to L*a*b color space is
depicted in Figure 3(b). Furthermore, by employing the clustering with the K-means algorithm [16], the area
of coffee beans was approximated. Due to the division of the image's area into two distinct regions—the
coffee bean region and the background region—the value of K was set into two. The steps of the K-means
algorithm are defined as follows [24]:
− Step 1: initialize number of cluster k and centre.
− Step 2: For each pixel of an image, calculate the Euclidean distance d, between the centre and each pixel
of an image using the relation given.
𝑑 = ‖𝑝(𝑥, 𝑦) − 𝑐𝑘‖ (1)
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248
4244
− Step 3: Assign all the pixels to the nearest centre based on distance d.
− Step 4: After all pixels have been assigned, recalculate new position of the centre using the relation given:
𝐶𝑖 =
1
𝑘
∑ ∑ 𝑝(𝑥, 𝑦)
𝑥𝜖𝑐𝑘
𝑦𝜖𝑐𝑘
(2)
− Step 4: Repeat the process until it satisfies the tolerance or error value.
− Step 5: Reshape the cluster pixels into image.
The resulting image of the K-means algorithm is shown in Figure 3(c). Afterward, a morphological
operation was applied using dilation; hence, the coffee bean area approaches the original, and the result is
depicted in Figure 3(d). Subsequently, the setting of the coffee bean area was carried out as the ground for
defining the ROI image boundary based on the yellow box, as shown in Figure 3(e). Accordingly, the formed
ROI images in binary and RGB color space are shown in Figures 3(f) and 3(g).
(a) (b) (c) (d) (e) (f) (g)
Figure 3. The resulting image of each process in ROI detection: (a) original image in RGB color space,
(b) L*a*b color space, (c) K-means clustering, (d) morphological operation, (e) setting the area of ROI
image, and (f) ROI image
2.3. Pre-processing
This procedure generated parameter values for feature extraction. This study examined color,
texture, and shape. RGB images must be converted to L*a*b and HSV to create color features, RGB images
to grayscale to create texture features, and binary images to build form features. In order to improve
classification results, the color space must be changed during pre-processing. Agricultural research uses RGB
for object classification. Some investigations have employed L*a*b and HSV color spaces. Using different
color spaces requires a conversion technique that uses RGB values [23]. In (3)-(6) define RGB-to-L*a*b
conversion. In HSV color space, in (3)-(4) calculate hue (H) and then saturation (S) and value (V). S and V
values were computed using as (5) and (6).
𝐻 = {
𝜃, 𝐵 ≤ 𝐺
360 − 𝜃, 𝐵 > 𝐺
(3)
where:
𝜃 = 𝑐𝑜𝑠−1
{
1
2
[(𝑅−𝐺)+(𝑅−𝐵)]
[𝑥(𝑅−𝐺)2+(𝑅−𝐵)(𝐺−𝐵]1/2} (4)
𝑆 = {
0, max (𝑅,𝐺, 𝐵) = 0
1 −
min(𝑅,𝐺,𝐵)
max(𝑅,𝐺,𝐵)
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(5)
𝑉 = max (𝑅, 𝐺, 𝐵) (6)
Furthermore, converting the RGB image to a grayscale image was needed; hence, this work applied
texture features. These feature parameters will later be used as input for the classification process. RGB
conversion to grayscale is carried out to produce intensity (I) values using (7) [11].
𝐼 =
1
3
(𝑅 + 𝐺 + 𝐵) (7)
2.4. Feature extraction
Coffee bean image feature extraction retrieves color, texture, and shape information. Some studies
analyze one aspect, while others analyze several. This study analyzes three approaches for extracting
three-color features using RGB, HSV, and L*a*b color model statistical values. RGB, HSV, and L*a*b are
Int J Artif Intell ISSN: 2252-8938 
Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini)
4245
useful color characteristics in many applications. Converting RGB to HSV and L*a*b limits color space
dimensions and features. Texture feature extraction using the gray level co-occurrence matrix (GLCM)
follows. Our form feature extraction approach uses statistical characteristics and shape distance in the binary
picture. Table 1 lists method feature counts. Adding features doesn't necessarily enhance model performance.
Thus, accurate classification requires careful feature selection.
Table 1. The number of features
Type of features Method Number of Features
Color RGB Model 3
HSV Model 3
L*a*b Model 3
Texture Statistic Feature of GLCM 84
Shape Area-Based 2
2.4.1. Color feature extraction
The HSV, RGB, and L*a*b color spaces have been used as color features to differentiate various
objects [11]. In (8) uses the mean to examine each color model channel's statistical properties. Where μ is the
average color channel. This study seeks HSV, RGB, or L*a*b feature extraction methods for coffee bean
data analysis. This study got red, green, and blue color values from the RGB image, hue, saturation, and
value values from the HSV image, and lightness (L) and color-opponent dimensions (a and b) indicating
redness–greenness and blueness–yellowness from the L*a*b image.
𝜇 =
1
𝑀𝑁
∑ ∑ (𝐼𝑖𝑗)
𝑁
𝑗=1
𝑀
𝑖=1 (8)
2.4.2. Texture feature extraction
Coffee bean textures can be identified by texture. Coffee bean fiber has distinct visual and texture
properties. The color characteristics of the perforated, wrinkled, cracked class are similar to those of the
intact class, making identification difficult. Only GLCM is used to extract texture features in this
investigation. GLCM has been used for texture feature extraction with good results.
Calculating the probability of the adjacency relationship between two pixels at a specific distance
and angle orientation yields GLCM [1]. Calculate the image's statistical attributes after collecting the
co-occurrence matrix. GLCM statistical features exist for four angles (0∘, 45∘, 90∘, and 135∘) and one distance
(1 pixel). GLCM (i, j) is the joint probability distribution of a pixel pair with gray levels i and j. Image gray
level determines GLCM matrix rows and columns. L is the computed gray levels minus 1. The grayscale
value of an image between 0 and 255 [7]. The types of features used in GLCM in this research include: auto
correlation, cluster prominence, cluster shade, contrast, correlation, difference entropy, difference variance,
dissimilarity, energy, entropy, IDM, information measures of correlation 1, and 2, inverse difference,
maximum probability, sum averages, sum entropy, sum of squares variances, sum variance, IDM normalized,
inverse difference normalized.
2.4.3. Shape feature extraction
The k-means method was employed to convert the coffee beans image into a binary image in order
to remove noise in the shape of the coffee bodies. The goal of the K-means algorithm is to cluster objects by
grouping them with the K points that are closest to them in the space. The values of cluster centroids are
updated iteratively until the optimal clustering results are achieved. Various shape parameters, such as
eccentricity (e) and perimeter (p), are calculated to assess the characteristics of coffee bean shape [20].
𝑒 = √1 −
𝑏2
𝑎2
(9)
𝑝 = 2𝜋√
(𝑎2+𝑏2)
2
(10)
2.5. Feature selection
The analysis should consider the total amount of features in order to identify the most beneficial or
highly discriminative features within the utilized dataset. The present study employed PCA as a method for
conducting feature selection. PCA is a well-established technique in the field of pattern recognition and
computer vision. It serves as a standard method for feature extraction and data representation, commonly
employed to identify and recognize objects. A statistical method reduced the number of dimensions in data
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248
4246
sets with many factors. It makes object recognition work better and has been shown to lower and raise the
accuracy value [25].
2.6. Classification
Data is classified by classification. Machine learning has several plant objects uses. By researching
algorithms and using data to forecast, machine learning automates operations. The algorithm uses a model to
estimate data and make judgments based on sample input instead than following fixed instructions.
Mathematical and statistical models predicted unknown data using training data. This study classified coffee
bean quality using machine learning. Backpropagation neural network (BPNN), linear discriminant analysis
(LDA), KNN, NB, and SVM were used. Previous research on numerous plant specimens used these
methodologies [19], [25].
2.7. Performance measurement
Feature selection and different classification approaches are used to evaluate the proposed method's
performance and determine the most appropriate and robust method for the data set. This study evaluates
1140 robusta bean photos (360 intact, 360 perforated, 360 wrinkled, and 360 cracked). The performance of
the classification method was assessed using three indicators: accuracy [25], which were calculated based on
the multiclass confusion matrix. The parameters are defined by (11) [25].
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎
× 100 (11)
3. RESULTS AND DISCUSSION
The total number of 1440 images consisted of four classes, including intact, perforated, wrinkled,
and cracked, used to evaluate the method. The classification results were compared for feature sets obtained
without and with feature selection. The feature set without feature selection was performed using the features
formed by GLCM, which were area-based and color-based (RGB, HSV, and L*a*b). The selected features
were applied to PCA. The experiment utilized five classifiers: BPNN, KNN, LDA, NB, and SVM. The
method's performance was assessed using accuracy. The comparison of method performance with different
classifiers using multi-feature indicated by the accuracy value is summarized in Table 2.
Table 2 demonstrates the testing of situations without feature selection. The tests were conducted
using texture characteristics obtained by the GLCM approach, texture data combined with shape features
(Area-based), and texture features combined with HSV, L*a*b, and RGB color spaces. The BPNN classifier
achieved the maximum accuracy of 94.83% while utilizing the GLCM. Subsequently, the LDA, KNN, SVM,
and NB classifiers yielded decreasing accuracy values of 92.08%, 85.21%, 83.54%, and 80.83%,
respectively. The BPNN classifier achieved the maximum accuracy of 97.86% for the area-based feature set,
while the NB classifier had the lowest accuracy of 58.54%. The performance relied on color attributes
derived from three color spaces: HSV, L*a*b, and RGB. The performance of the approach utilizing HSV
showed that SVM attained an ideal accuracy rate of 92.92%. Using the L*a*b and RGB feature sets, the
BPNN achieved an impressive accuracy rate of 97.71%.
Table 2. Performance comparison of the classifier with various feature sets based on accuracy value (%)
Classifier
Without features selection With features selection
GLCM Area Based HSV L*a*b RGB PCA
BPNN 94.83 97.86 84.79 97.71 97.71 98.54
KNN 85.21 84.38 85.42 90.83 89.79 90.83
LDA 92.08 91.46 77.92 91.46 68.33 80.63
NB 80.83 58.54 53.96 53.54 48.54 55.83
SVM 83.54 72.71 92.92 97.50 94.83 97.50
In the results obtained via PCA feature selection, the backpropagation classifier attained the
maximum accuracy value of 98.54%. By contrast, the NB classifier yielded the lowest results, achieving an
accuracy of 55.83%. Backpropagation demonstrates high accuracy across all feature test situations without
requiring feature selection. The LDA algorithm achieved the highest accuracy rate of 98.54% across all four
test scenarios. It was performed using the GLCM feature, Area-based method, HSV color space values, and
PCA application for feature selection. The NB classifier is consistently overwhelmed by every trial situation.
The observed results indicate that a combination of texture, shape, and color features, followed by feature
selection to limit the number of features, might lead to high accuracy throughout the classification process.
The BPNN classifier performs better than other classifiers by minimizing errors in each scenario.
Int J Artif Intell ISSN: 2252-8938 
Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini)
4247
4. CONCLUSION
This study classifies robusta coffee beans by quality. There are four types of coffee beans: intact,
perforated, wrinkled, and broken. This procedure involves ROI detection, pre-processing, segmentation,
feature extraction, selection, and classification. Each step is done to accurately classify coffee beans and
determine their quality. The study tested designs with texture, texture with shape, and texture with color
space values (HSV, L*a*b, and RGB). BPNN study routinely outperforms other coffee bean quality
assessment methodologies. It uses the PCA feature selection technique to get the best results on GLCM,
area-based, and L*a*b features with 98.54% accuracy. Using several scenarios and attributes can improve the
variety and quality of this research.
ACKNOWLEDGEMENTS
The author would like to thank the Faculty of Engineering at Mulawarman University in Samarinda,
Indonesia, for providing financial support for the research conducted in 2023 (No. 497/UN17.L1/HK/2023).
REFERENCES
[1] M. R. Fiona, S. Thomas, I. J. Maria, and B. Hannah, “Identification of ripe and unripe citrus fruits using artificial neural
network,” in Journal of Physics: Conference Series, IOP Publishing, 2019, doi: 10.1088/1742-6596/1362/1/012033.
[2] S. Munera, F. Hernández, N. Aleixos, S. Cubero, and J. Blasco, “Maturity monitoring of intact fruit and arils of pomegranate cv.
‘Mollar de Elche’ using machine vision and chemometrics,” Postharvest Biology and Technology, vol. 156, 2019, doi:
10.1016/j.postharvbio.2019.110936.
[3] Hamdani, A. Septiarini, and D. M. Khairina, “Model assessment of land suitability decision making for oil palm plantation,” in
2016 2nd International Conference on Science in Information Technology, ICSITech 2016: Information Science for Green Society
and Environment, IEEE, 2017, pp. 109–113, doi: 10.1109/ICSITech.2016.7852617.
[4] A. Yudhana, R. Umar, and F. M. Ayudewi, “The monitoring of corn sprouts growth using the region growing methods,” in
Journal of Physics: Conference Series, IOP Publishing, Nov. 2019, doi: 10.1088/1742-6596/1373/1/012054.
[5] A. Sezgin and V. Küçük, “Computer science monitoring plant growth with image processing methods and artificial intelligence supported
agriculture system,” in International Artificial Intelligenceand Data Processing Symposium, 2022, pp. 165-176, doi: 10.53070/bbd.1172774.
[6] B. Açıkalın and N. Sanlier, “Coffee and its effects on the immune system,” Trends in Food Science & Technology, vol. 114, pp.
625–632, Aug. 2021, doi: 10.1016/j.tifs.2021.06.023.
[7] E. Piedad, J. I. Larada, G. J. Pojas, and L. V. V Ferrer, “Postharvest classification of banana (Musa acuminata) using tier-based
machine learning,” Postharvest Biology and Technology, vol. 145, pp. 93–100, 2018, doi: 10.1016/j.postharvbio.2018.06.004.
[8] A. Noviyanto and W. H. Abdulla, “Honey botanical origin classification using hyperspectral imaging and machine learning,”
Journal of Food Engineering, vol. 265, 2020, doi: 10.1016/j.jfoodeng.2019.109684.
[9] D. Zhang, D. J. Lee, B. J. Tippetts, and K. D. Lillywhite, “Date maturity and quality evaluation using color distribution analysis
and back projection,” Journal of Food Engineering, vol. 131, pp. 161–169, 2014, doi: 10.1016/j.jfoodeng.2014.02.002.
[10] A. Septiarini, H. Hamdani, T. Hardianti, E. Winarno, S. Suyanto, and E. Irwansyah, “Pixel quantification and color feature extraction
on leaf images for oil palm disease identification,” in 7th International Conference on Electrical, Electronics and Information
Engineering: Technological Breakthrough for Greater New Life, 2021, pp. 1–5, doi: 10.1109/ICEEIE52663.2021.9616645.
[11] W. G. D. Costa, I. D. P. Barbosa, J. E. D. Souza, C. D. Cruz, M. Nascimento, and A. C. B. D. Oliveira, “Machine learning and
statistics to qualify environments through multi-traits in Coffea arabica,” PLoS ONE, vol. 16, no. 1, pp. e0245298–e0245298, Jan.
2021, doi: 10.1371/journal.pone.0245298.
[12] F. F. L. D. Santos, J. T. F. Rosas, R. N. Martins, G. D. M. Araújo, L. D. A. Viana, and J. D. P. Gonçalves, “Quality assessment of coffee
beans through computer vision and machine learning algorithms,” Coffee Science, vol. 15, no. 1, pp. 1–9, 2020, doi: 10.25186/.v15i.1752.
[13] A. Septiarini, H. Hamdani, A. Rifani, Z. Arifin, N. Hidayat, and H. Ismanto, “Multi-class support vector machine for arabica coffee
bean roasting grade classification,” in ICOIACT 2022 - 5th International Conference on Information and Communications Technology:
A New Way to Make AI Useful for Everyone in the New Normal Era, 2022, pp. 407–411, doi: 10.1109/ICOIACT55506.2022.9971897.
[14] R. S. El-Sayed and M. N. El-Sayed, “Classification of vehicles’ types using histogram oriented gradients: comparative study and
modification,” IAES International Journal of Artificial Intelligence, vol. 9, no. 4, pp. 700–712, 2020, doi: 10.11591/ijai.v9.i4.pp700-712.
[15] M. Sharif, M. A. Khan, Z. Iqbal, M. F. Azam, M. I. U. Lali, and M. Y. Javed, “Detection and classification of citrus diseases in
agriculture based on optimized weighted segmentation and feature selection,” Computers and Electronics in Agriculture, vol. 150,
pp. 220–234, 2018, doi: 10.1016/j.compag.2018.04.023.
[16] A. Septiarini, H. Hamdani, S. U. Sari, H. Rahmania Hatta, N. Puspitasari, and W. Hadikurniawati, “Image processing techniques
for tomato segmentation applying k-means clustering and edge detection approach,” in 2021 International Seminar on Machine
Learning, Optimization, and Data Science, ISMODE 2021, IEEE, 2022, pp. 92–96, doi: 10.1109/ISMODE53584.2022.9742740.
[17] J. Lu et al., “Lightweight green citrus fruit detection method for practical environmental applications,” Computers and
Electronics in Agriculture, vol. 215, 2023, doi: 10.1016/j.compag.2023.108205.
[18] J. Liang, K. Huang, H. Lei, Z. Zhong, Y. Cai, and Z. Jiao, “Occlusion-aware fruit segmentation in complex natural environments
under shape prior,” Computers and Electronics in Agriculture, vol. 217, 2024, doi: 10.1016/j.compag.2024.108620.
[19] X. Yang, R. Zhang, Z. Zhai, Y. Pang, and Z. Jin, “Machine learning for cultivar classification of apricots (Prunus armeniaca L.)
based on shape features,” Scientia Horticulturae, vol. 256, 2019, doi: 10.1016/j.scienta.2019.05.051.
[20] H. Li, W. S. Lee, and K. Wang, “Identifying blueberry fruit of different growth stages using natural outdoor color images,”
Computers and Electronics in Agriculture, vol. 106, pp. 91–101, 2014, doi: 10.1016/j.compag.2014.05.015.
[21] García, C. Becerra, and Hoyos, “Quality and defect inspection of green coffee beans using a computer vision system,” Applied
Sciences, vol. 9, no. 19, Oct. 2019, doi: 10.3390/app9194195.
[22] H. C. Bazame, J. P. Molin, D. Althoff, and M. Martello, “Detection, classification, and mapping of coffee fruits during harvest
with computer vision,” Computers and Electronics in Agriculture, vol. 183, 2021, doi: 10.1016/j.compag.2021.106066.
[23] F. G. -Lamont, J. Cervantes, A. López, and L. Rodriguez, “Segmentation of images by color features: A survey,”
Neurocomputing, vol. 292, pp. 1–27, 2018, doi: 10.1016/j.neucom.2018.01.091.
[24] N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image segmentation using K-means clustering algorithm and subtractive
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248
4248
clustering algorithm,” Procedia Computer Science, vol. 54, pp. 764–771, 2015, doi: 10.1016/j.procs.2015.06.090.
[25] A. Septiarini, R. Saputra, A. Tedjawati, M. Wati, and H. Hamdani, “Pattern recognition of sarong fabric using machine learning
approach based on computer vision for cultural preservation,” International Journal of Intelligent Engineering and Systems, vol.
15, no. 5, pp. 284–295, 2022, doi: 10.22266/ijies2022.1031.26.
BIOGRAPHIES OF AUTHORS
Anindita Septiarini is a professor at the Department of Informatics at
Mulawarman University, Indonesia. She holds a Doctoral degree in Computer Science from
Gadjah Mada University, Indonesia, specializing in image analysis. She is also a researcher
and got a grant from the Ministry of Education, Culture, Research, and Technology of
Indonesia from 2016 until the present. Her research interests lie in artificial intelligence,
especially pattern recognition, image processing, and computer vision. She has received
national awards such as scientific article incentives from the Ministry of Education, Culture,
Research, and Technology of Indonesia in 2017 and 2019. She held several administrative
posts with the Department of Informatics, Mulawarman University, Indonesia, from 2018 to
2020, including the head of department and the head of laboratory. She can be contacted at
email: anindita@unmul.ac.id.
Hamdani Hamdani is a professor in Department of Informatics at Mulawarman
University, Indonesia. He is a lecturer and researcher since 2005 at Mulawarman University,
Indonesia. His research interests lie in the field of artificial intelligence, especially pattern
recognition, decision support system, and expert system. He received her bachelor’s degree in
2002 from Ahmad Dahlan University, Indonesia, his master’s degree in 2009 from Gadjah
Mada University, Indonesia, and her doctoral degree in computer science in 2018 from
Gadjah Mada University Indonesia. He can be contacted at email: hamdani@unmul.ac.id.
Aji Ery Burhandenny is an assistant professor in Department of Electrical
Engineering at Mulawarman University, Indonesia, specializing in Software Engineering. His
research interests lie in empirical software engineering particularly understanding human
factors in software development activities. He also made several contributions to internet of
things, machine learning, and green technologies related topics. He can be contacted at email:
a.burhandenny@ft.unmul.ac.id.
Damar Nurcahyono is an assistant professor at The Information Technology
Department of the State Polytechnic of Samarinda, Indonesia, specializing in Software
Engineering. His research interests lie in software engineering especially games and
animation. He has also made several contributions to software development, game
development, and environmental technology-related topics. He can be contacted at email:
damarnc@polnes.ac.id.
Surya Eka Priyatna is a lecturer at the Department of Information Engineering,
Antasari State Islamic University, Indonesia, specializing in mobile application design. His
research interests lie in human needs in the use of computer applications. He has also made
several contributions on topics related to mobile applications. He can be contacted at email:
suryaekapriyatna@uin-antasari.ac.id.

More Related Content

PDF
Robusta coffee leaf diseases detection based on MobileNetV2 model
PDF
Computer vision for purity, phenol, and pH detection of Luwak Coffee green bean
PPTX
PPTX
Image processing and machine learning based Ethiopian Coffee bean varieties .
PPT
Iceei2013 expert system in detecting coffee plant diseases
PDF
Image Analysis for Ethiopian Coffee Plant Diseases Identification
PDF
Identification of Cocoa Pods with Image Processing and Artificial Neural Netw...
PDF
Coffee cropmonitoring draft
Robusta coffee leaf diseases detection based on MobileNetV2 model
Computer vision for purity, phenol, and pH detection of Luwak Coffee green bean
Image processing and machine learning based Ethiopian Coffee bean varieties .
Iceei2013 expert system in detecting coffee plant diseases
Image Analysis for Ethiopian Coffee Plant Diseases Identification
Identification of Cocoa Pods with Image Processing and Artificial Neural Netw...
Coffee cropmonitoring draft

Similar to Image analysis for classifying coffee bean quality using a multi feature and machine learning approach (20)

PPTX
Coffee Quality Analysis Project with PowerBI
PDF
Coffee Classifier Machine PRESENTATION.pdf
PDF
AI-Enabled Fruit Decay Detection - CSEIJ
PDF
528Seed Technological Development – A Survey
PDF
Coffee Grounds Can Kill Mosquitoes
PDF
Color Distribution Analysis for Ripeness Prediction of Golden Apollo Melon
PPTX
technical seminar [Pavan Y N] final.pptx
PDF
FincaLabInfo
PDF
Detection roasting level of Lintong coffee beans by using euclidean distance
PDF
Classification of arecanut using machine learning techniques
PDF
The Effects of Segmentation Techniques in Digital Image Based Identification ...
PPTX
DOCX
2110071135_ip_m-template-instructions.docx
PDF
Quality Analysis and Classification of Rice Grains using Image Processing Tec...
PDF
Gq3611971205
PDF
ORGANIC PRODUCT DISEASE DETECTION USING CNN
PDF
Using k means cluster and fuzzy c means for defect segmentation in fruits
PDF
Using k means cluster and fuzzy c means for defect segmentation in fruits
PDF
Using k means cluster and fuzzy c means for defect segmentation in fruits
DOCX
Concept-and-Sub-Concept of adlai research
Coffee Quality Analysis Project with PowerBI
Coffee Classifier Machine PRESENTATION.pdf
AI-Enabled Fruit Decay Detection - CSEIJ
528Seed Technological Development – A Survey
Coffee Grounds Can Kill Mosquitoes
Color Distribution Analysis for Ripeness Prediction of Golden Apollo Melon
technical seminar [Pavan Y N] final.pptx
FincaLabInfo
Detection roasting level of Lintong coffee beans by using euclidean distance
Classification of arecanut using machine learning techniques
The Effects of Segmentation Techniques in Digital Image Based Identification ...
2110071135_ip_m-template-instructions.docx
Quality Analysis and Classification of Rice Grains using Image Processing Tec...
Gq3611971205
ORGANIC PRODUCT DISEASE DETECTION USING CNN
Using k means cluster and fuzzy c means for defect segmentation in fruits
Using k means cluster and fuzzy c means for defect segmentation in fruits
Using k means cluster and fuzzy c means for defect segmentation in fruits
Concept-and-Sub-Concept of adlai research
Ad

More from IAESIJAI (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Automatic detection of dress-code surveillance in a university using YOLO alg...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
Improved convolutional neural networks for aircraft type classification in re...
PDF
Primary phase Alzheimer's disease detection using ensemble learning model
PDF
Deep learning-based techniques for video enhancement, compression and restora...
PDF
Hybrid model detection and classification of lung cancer
PDF
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
PDF
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
PDF
Event detection in soccer matches through audio classification using transfer...
PDF
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
PDF
Optimizing deep learning models from multi-objective perspective via Bayesian...
PDF
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Exploring DenseNet architectures with particle swarm optimization: efficient ...
A comparative study of natural language inference in Swahili using monolingua...
Abstractive summarization using multilingual text-to-text transfer transforme...
Enhancing emotion recognition model for a student engagement use case through...
Automatic detection of dress-code surveillance in a university using YOLO alg...
Hindi spoken digit analysis for native and non-native speakers
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Improved convolutional neural networks for aircraft type classification in re...
Primary phase Alzheimer's disease detection using ensemble learning model
Deep learning-based techniques for video enhancement, compression and restora...
Hybrid model detection and classification of lung cancer
Adaptive kernel integration in visual geometry group 16 for enhanced classifi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing fall detection and classification using Jarratt‐butterfly optimizat...
Deep ensemble learning with uncertainty aware prediction ranking for cervical...
Event detection in soccer matches through audio classification using transfer...
Detecting road damage utilizing retinaNet and mobileNet models on edge devices
Optimizing deep learning models from multi-objective perspective via Bayesian...
Squeeze-excitation half U-Net and synthetic minority oversampling technique o...
A novel scalable deep ensemble learning framework for big data classification...
Exploring DenseNet architectures with particle swarm optimization: efficient ...
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Getting Started with Data Integration: FME Form 101
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
Getting Started with Data Integration: FME Form 101
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
Assigned Numbers - 2025 - Bluetooth® Document
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf

Image analysis for classifying coffee bean quality using a multi feature and machine learning approach

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 4, December 2024, pp. 4241~4248 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i4.pp4241-4248  4241 Journal homepage: http://ijai.iaescore.com Image analysis for classifying coffee bean quality using a multi- feature and machine learning approach Anindita Septiarini1 , Hamdani Hamdani1 , Aji Ery Burhandeny2 , Damar Nurcahyono3 , Surya Eka Priyatna4 1 Department of Informatics, Faculty of Engineering, Mulawarman University, Samarinda, Indonesia 2 Department of Electronic Engineering, Faculty of Engineering, Mulawarman University, Samarinda, Indonesia 3 Department of Information Technology, Politeknik Negeri Samarinda, Samarinda, Indonesia 4 Department of Information Technology, Faculty of Da'wah and Communication Sciences, Antasari State Islamic University, Banjarmasin, Indonesia Article Info ABSTRACT Article history: Received Nov 27, 2023 Revised Feb 11, 2024 Accepted Feb 28, 2024 Price and customer satisfaction depend on coffee bean quality. The coffee industry must analyze coffee bean quality. Global demand for robusta coffee is high. Coffee industry professionals mostly understand coffee bean quality. Thus, an image analysis using a computer vision-based approach for classifying robusta coffee bean quality is required. Image acquisition, region of interest (ROI) detection, pre-processing, segmentation, feature extraction, feature selection, and classification are covered in this study. A multi-feature derived based on color, shape, and texture features was employed in feature extraction, followed by feature selection using principal component analysis (PCA). Several machine-learning methods classified the coffee beans. The method performance was assessed using precision, recall, and accuracy. The selected features using the backpropagation neural network (BPNN) classifier outperformed others with 98.54% accuracy. Keywords: Coffee beans Features selection K-means Machine learning Principal component analysis This is an open access article under the CC BY-SA license. Corresponding Author: Anindita Septiarini Department of Informatics, Faculty of Engineering, Mulawarman University St. Sambaliung, No. 9, Samarinda, Indonesia Email: [email protected] 1. INTRODUCTION The utilization of computers and associated technologies is seeing fast expansion and diversification. The application of this is being observed in the field of agriculture. There exist multiple instances wherein computers have been employed in the agricultural sector, encompassing the monitoring of fruit ripeness [1], [2], land management [3], and plant development [4], [5]. Coffee, as one of the most widely consumed beverages globally, holds significant importance as an economic commodity. The global popularity of coffee can be attributed to its stimulating properties and the preference for its bitter flavor. Coffee serves as a substantial provider of caffeine for a considerable number of individuals. While previous research has established a connection between coffee and caffeine intake and adverse health effects, recent studies have presented evidence suggesting that the compounds found in coffee, such as caffeine, chlorogenic acids, kahweol, cafestol, and various micronutrients (such as magnesium, potassium, and phosphorus), may enhance the immune system and provide protection against the development of conditions such as obesity, diabetes, neurological diseases, osteoporosis, and pancreatic cancer [6]. The coffee industry values quality because of the relationship between coffee bean scarcity, monetary compensation, and consumer happiness. Robusta coffee beans, widely grown, have a distinct taste and aroma. Quality of robusta coffee beans depends on soil makeup, climate, and processing method. Coffee
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248 4242 prices depend on bean quality. It is crucial to note that not all growers and coffee shop owners can identify coffee bean quality. Thus, errors may occur when they lack this expertise. Grading is time-consuming and produces inconsistent outcomes. Due to visual perception limits, fatigue, and coffee quality evaluation differences, these inconsistencies occur. Visual characteristics are often used to evaluate robusta coffee beans. In this situation, computer vision may work. It extracts robusta coffee bean visual traits that highly predict quality. Color, form, and texture may be needed for this procedure. Numerous research investigations have been conducted in computer vision, focusing on the application of food processing. These studies encompass a range of food items, such as banana [7], honey [8], date fruit [9], palm oil [10], and coffee [11], [12]. The construction of this system involves several general processes, namely pre-processing, segmentation, feature extraction, and classification [13]. Common pre-processing tasks often involve scaling [14] and converting color spaces [15]. The Otsu thresholding method [13], K-means clustering algorithm [16], and edge detection approach [17] were subsequently employed, along with many established segmentation methodologies. The extractable features that can be considered for edibles encompass color [10], shape [18], and texture [11]. Moreover, naïve Bayes (NB) [10], k-nearest neighbor (KNN) [19], and support vector machines (SVM) [10] are frequently utilized in the classification process. Recent studies have used machine learning to classify coffee beans across agricultural situations. Color and shape helped identify high-quality beans. The investigation used image processing and machine learning on an Arduino mega board. Essential criteria were assessed to determine high-quality green coffee beans. KNN was used to evaluate coffee beans and classify them by defect type. Logic, image processing, and supervised learning algorithms are executed and coded on the Arduino board. The machine vision system has an average accuracy of 94.79% for quality and 95.78% for defect-type evaluation. However, long berry bean classification was 98.05% accurate [20]. Subsequently, a variety of machine learning methodologies such as SVM, deep neural networks (DNN), and random forest (RF) were utilized to evaluate the significance of shape and color characteristics in the assessment of faults in coffee beans. The data presented in the study highlights the significance of color descriptors in the classification of faults in coffee beans. The classification models consider the most significant features obtained from the average G value of the component in the RGB color space and the average V value in the HSV color space. All the classifier models exhibited comparable performance, with the best accuracy value above 88% [12]. Several efforts were presented in order to identify and categorize coffee fruits, as well as to map the stage of maturation of these fruits during the harvest process. The methodology was executed utilizing the Darknet framework. The YOLOv3-tiny object identification system identified and categorized coffee fruit. The collection contains 90 videos from the 2020 arabica coffee (Catuaí 144) harvest, shot at a coffee harvester's discharge conveyor termination point. A business area in Patos de Minas, Minas Gerais, Brazil hosted the recordings. The model performed best at around 3300th iteration with an 800×800-pixel image input. The model had 84% mean average precision (mAP), 82% F1-score, 83% precision, and 82% recall in the validation set. The precision values for unripe, ripe, and overripe coffee fruits were 86%, 85%, and 80%, respectively [21]. Another study used a convolutional network on an inexpensive micro-controller board to classify coffee leaf diseases locally without the internet. Early diagnosis of coffee plant diseases was crucial for optimal output and production quality. Two datasets and development board images were used in this investigation. The collection included around 6000 images from six sickness classes. The incorporated cascade and single-stage systems were 98% and 96% accurate, respectively. These findings imply that these structures detect coffee plantation diseases [22]. This study presents a proposed method for classifying coffee bean quality based on computer vision techniques. The method utilizes color, shape, and texture data extracted from the RGB, HSV, and L*a*b color spaces. The BP was employed as the classifier in this work. The objective of this method was to ascertain the classification of coffee beans according to their quality by utilizing image data. The quality types were classified into four classes: intact, perforated, wrinkled, and cracked. 2. MATERIALS AND METHODS The approach predicted the quality class of all robusta coffee bean photos. Its main processes were region of interest (ROI) detection, pre-processing, feature extraction, selection, and classification. The method has two phases: training and testing. Training and testing sets provided input for each phase. Both phases were handled differently. ROI detection assigned the coffee bean area to the image using K-means clustering. The training step pre-processed RGB data into grayscale, HSV, and L*a*b. Afterward, color, texture, and shape were used to extract features. Subsequently, the feature selection procedure was used to choose the most significant features and simplify classification. Pre-processing merely converted RGB to the feature selection color space during testing. Principal component analysis (PCA) generated the selected features in the proposed technique. The feature selection result was used to apply the extracted feature. A
  • 3. Int J Artif Intell ISSN: 2252-8938  Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini) 4243 prediction class (intact/perforated/wrinkled/cracked) was determined from selected features in the final step. Figure 1 illustrates the robusta coffee bean quality classification. Figure 1. Overview of all steps in the proposed method for quality classification of robusta coffee bean 2.1. Dataset The dataset in this study was images of robusta coffee beans. JPEG images were taken with a Xiaomi 5A smartphone's inbuilt camera. The coffee bean was placed on a white background in the center of a 28×19×18 cm studio minibox. A 10 cm gap between the camera and the coffee beans was maintained by deliberately positioning and orienting the camera. Smartphone cameras are 13-megapixel. The image has dimensions of 1560×1560 pixels. The dataset had 1440 coffee bean images, 360 each class. It was divided into four classes: intact, perforated, wrinkled, or cracked, with the example image shown in Figures 2(a) to 2(d), respectively. (a) (b) (c) (d) Figure 2. Examples of coffee bean images with various quality types: (a) intact, (b) perforated, (c) wrinkled, and (d) cracked 2.2. Region of interest detection ROI detection attempts to generate a sub-image mostly of the coffee bean area. During this stage, the initial resolution of the image was reduced from 1560×1560 pixels to 500×500 pixels [3] to minimize the computational time. Subsequently, a color space conversion was performed from RGB to L*a*b; this enabled the system to accurately differentiate between object and background regions in various scenarios. The utilization of L*a*b color spaces necessitates a conversion procedure that relies on the values within the RGB color space, which are explicitly defined as in [23]. The result of the conversion of an original image in RGB Figure 3(a) to L*a*b color space is depicted in Figure 3(b). Furthermore, by employing the clustering with the K-means algorithm [16], the area of coffee beans was approximated. Due to the division of the image's area into two distinct regions—the coffee bean region and the background region—the value of K was set into two. The steps of the K-means algorithm are defined as follows [24]: − Step 1: initialize number of cluster k and centre. − Step 2: For each pixel of an image, calculate the Euclidean distance d, between the centre and each pixel of an image using the relation given. 𝑑 = ‖𝑝(𝑥, 𝑦) − 𝑐𝑘‖ (1)
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248 4244 − Step 3: Assign all the pixels to the nearest centre based on distance d. − Step 4: After all pixels have been assigned, recalculate new position of the centre using the relation given: 𝐶𝑖 = 1 𝑘 ∑ ∑ 𝑝(𝑥, 𝑦) 𝑥𝜖𝑐𝑘 𝑦𝜖𝑐𝑘 (2) − Step 4: Repeat the process until it satisfies the tolerance or error value. − Step 5: Reshape the cluster pixels into image. The resulting image of the K-means algorithm is shown in Figure 3(c). Afterward, a morphological operation was applied using dilation; hence, the coffee bean area approaches the original, and the result is depicted in Figure 3(d). Subsequently, the setting of the coffee bean area was carried out as the ground for defining the ROI image boundary based on the yellow box, as shown in Figure 3(e). Accordingly, the formed ROI images in binary and RGB color space are shown in Figures 3(f) and 3(g). (a) (b) (c) (d) (e) (f) (g) Figure 3. The resulting image of each process in ROI detection: (a) original image in RGB color space, (b) L*a*b color space, (c) K-means clustering, (d) morphological operation, (e) setting the area of ROI image, and (f) ROI image 2.3. Pre-processing This procedure generated parameter values for feature extraction. This study examined color, texture, and shape. RGB images must be converted to L*a*b and HSV to create color features, RGB images to grayscale to create texture features, and binary images to build form features. In order to improve classification results, the color space must be changed during pre-processing. Agricultural research uses RGB for object classification. Some investigations have employed L*a*b and HSV color spaces. Using different color spaces requires a conversion technique that uses RGB values [23]. In (3)-(6) define RGB-to-L*a*b conversion. In HSV color space, in (3)-(4) calculate hue (H) and then saturation (S) and value (V). S and V values were computed using as (5) and (6). 𝐻 = { 𝜃, 𝐵 ≤ 𝐺 360 − 𝜃, 𝐵 > 𝐺 (3) where: 𝜃 = 𝑐𝑜𝑠−1 { 1 2 [(𝑅−𝐺)+(𝑅−𝐵)] [𝑥(𝑅−𝐺)2+(𝑅−𝐵)(𝐺−𝐵]1/2} (4) 𝑆 = { 0, max (𝑅,𝐺, 𝐵) = 0 1 − min(𝑅,𝐺,𝐵) max(𝑅,𝐺,𝐵) , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (5) 𝑉 = max (𝑅, 𝐺, 𝐵) (6) Furthermore, converting the RGB image to a grayscale image was needed; hence, this work applied texture features. These feature parameters will later be used as input for the classification process. RGB conversion to grayscale is carried out to produce intensity (I) values using (7) [11]. 𝐼 = 1 3 (𝑅 + 𝐺 + 𝐵) (7) 2.4. Feature extraction Coffee bean image feature extraction retrieves color, texture, and shape information. Some studies analyze one aspect, while others analyze several. This study analyzes three approaches for extracting three-color features using RGB, HSV, and L*a*b color model statistical values. RGB, HSV, and L*a*b are
  • 5. Int J Artif Intell ISSN: 2252-8938  Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini) 4245 useful color characteristics in many applications. Converting RGB to HSV and L*a*b limits color space dimensions and features. Texture feature extraction using the gray level co-occurrence matrix (GLCM) follows. Our form feature extraction approach uses statistical characteristics and shape distance in the binary picture. Table 1 lists method feature counts. Adding features doesn't necessarily enhance model performance. Thus, accurate classification requires careful feature selection. Table 1. The number of features Type of features Method Number of Features Color RGB Model 3 HSV Model 3 L*a*b Model 3 Texture Statistic Feature of GLCM 84 Shape Area-Based 2 2.4.1. Color feature extraction The HSV, RGB, and L*a*b color spaces have been used as color features to differentiate various objects [11]. In (8) uses the mean to examine each color model channel's statistical properties. Where μ is the average color channel. This study seeks HSV, RGB, or L*a*b feature extraction methods for coffee bean data analysis. This study got red, green, and blue color values from the RGB image, hue, saturation, and value values from the HSV image, and lightness (L) and color-opponent dimensions (a and b) indicating redness–greenness and blueness–yellowness from the L*a*b image. 𝜇 = 1 𝑀𝑁 ∑ ∑ (𝐼𝑖𝑗) 𝑁 𝑗=1 𝑀 𝑖=1 (8) 2.4.2. Texture feature extraction Coffee bean textures can be identified by texture. Coffee bean fiber has distinct visual and texture properties. The color characteristics of the perforated, wrinkled, cracked class are similar to those of the intact class, making identification difficult. Only GLCM is used to extract texture features in this investigation. GLCM has been used for texture feature extraction with good results. Calculating the probability of the adjacency relationship between two pixels at a specific distance and angle orientation yields GLCM [1]. Calculate the image's statistical attributes after collecting the co-occurrence matrix. GLCM statistical features exist for four angles (0∘, 45∘, 90∘, and 135∘) and one distance (1 pixel). GLCM (i, j) is the joint probability distribution of a pixel pair with gray levels i and j. Image gray level determines GLCM matrix rows and columns. L is the computed gray levels minus 1. The grayscale value of an image between 0 and 255 [7]. The types of features used in GLCM in this research include: auto correlation, cluster prominence, cluster shade, contrast, correlation, difference entropy, difference variance, dissimilarity, energy, entropy, IDM, information measures of correlation 1, and 2, inverse difference, maximum probability, sum averages, sum entropy, sum of squares variances, sum variance, IDM normalized, inverse difference normalized. 2.4.3. Shape feature extraction The k-means method was employed to convert the coffee beans image into a binary image in order to remove noise in the shape of the coffee bodies. The goal of the K-means algorithm is to cluster objects by grouping them with the K points that are closest to them in the space. The values of cluster centroids are updated iteratively until the optimal clustering results are achieved. Various shape parameters, such as eccentricity (e) and perimeter (p), are calculated to assess the characteristics of coffee bean shape [20]. 𝑒 = √1 − 𝑏2 𝑎2 (9) 𝑝 = 2𝜋√ (𝑎2+𝑏2) 2 (10) 2.5. Feature selection The analysis should consider the total amount of features in order to identify the most beneficial or highly discriminative features within the utilized dataset. The present study employed PCA as a method for conducting feature selection. PCA is a well-established technique in the field of pattern recognition and computer vision. It serves as a standard method for feature extraction and data representation, commonly employed to identify and recognize objects. A statistical method reduced the number of dimensions in data
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248 4246 sets with many factors. It makes object recognition work better and has been shown to lower and raise the accuracy value [25]. 2.6. Classification Data is classified by classification. Machine learning has several plant objects uses. By researching algorithms and using data to forecast, machine learning automates operations. The algorithm uses a model to estimate data and make judgments based on sample input instead than following fixed instructions. Mathematical and statistical models predicted unknown data using training data. This study classified coffee bean quality using machine learning. Backpropagation neural network (BPNN), linear discriminant analysis (LDA), KNN, NB, and SVM were used. Previous research on numerous plant specimens used these methodologies [19], [25]. 2.7. Performance measurement Feature selection and different classification approaches are used to evaluate the proposed method's performance and determine the most appropriate and robust method for the data set. This study evaluates 1140 robusta bean photos (360 intact, 360 perforated, 360 wrinkled, and 360 cracked). The performance of the classification method was assessed using three indicators: accuracy [25], which were calculated based on the multiclass confusion matrix. The parameters are defined by (11) [25]. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 × 100 (11) 3. RESULTS AND DISCUSSION The total number of 1440 images consisted of four classes, including intact, perforated, wrinkled, and cracked, used to evaluate the method. The classification results were compared for feature sets obtained without and with feature selection. The feature set without feature selection was performed using the features formed by GLCM, which were area-based and color-based (RGB, HSV, and L*a*b). The selected features were applied to PCA. The experiment utilized five classifiers: BPNN, KNN, LDA, NB, and SVM. The method's performance was assessed using accuracy. The comparison of method performance with different classifiers using multi-feature indicated by the accuracy value is summarized in Table 2. Table 2 demonstrates the testing of situations without feature selection. The tests were conducted using texture characteristics obtained by the GLCM approach, texture data combined with shape features (Area-based), and texture features combined with HSV, L*a*b, and RGB color spaces. The BPNN classifier achieved the maximum accuracy of 94.83% while utilizing the GLCM. Subsequently, the LDA, KNN, SVM, and NB classifiers yielded decreasing accuracy values of 92.08%, 85.21%, 83.54%, and 80.83%, respectively. The BPNN classifier achieved the maximum accuracy of 97.86% for the area-based feature set, while the NB classifier had the lowest accuracy of 58.54%. The performance relied on color attributes derived from three color spaces: HSV, L*a*b, and RGB. The performance of the approach utilizing HSV showed that SVM attained an ideal accuracy rate of 92.92%. Using the L*a*b and RGB feature sets, the BPNN achieved an impressive accuracy rate of 97.71%. Table 2. Performance comparison of the classifier with various feature sets based on accuracy value (%) Classifier Without features selection With features selection GLCM Area Based HSV L*a*b RGB PCA BPNN 94.83 97.86 84.79 97.71 97.71 98.54 KNN 85.21 84.38 85.42 90.83 89.79 90.83 LDA 92.08 91.46 77.92 91.46 68.33 80.63 NB 80.83 58.54 53.96 53.54 48.54 55.83 SVM 83.54 72.71 92.92 97.50 94.83 97.50 In the results obtained via PCA feature selection, the backpropagation classifier attained the maximum accuracy value of 98.54%. By contrast, the NB classifier yielded the lowest results, achieving an accuracy of 55.83%. Backpropagation demonstrates high accuracy across all feature test situations without requiring feature selection. The LDA algorithm achieved the highest accuracy rate of 98.54% across all four test scenarios. It was performed using the GLCM feature, Area-based method, HSV color space values, and PCA application for feature selection. The NB classifier is consistently overwhelmed by every trial situation. The observed results indicate that a combination of texture, shape, and color features, followed by feature selection to limit the number of features, might lead to high accuracy throughout the classification process. The BPNN classifier performs better than other classifiers by minimizing errors in each scenario.
  • 7. Int J Artif Intell ISSN: 2252-8938  Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini) 4247 4. CONCLUSION This study classifies robusta coffee beans by quality. There are four types of coffee beans: intact, perforated, wrinkled, and broken. This procedure involves ROI detection, pre-processing, segmentation, feature extraction, selection, and classification. Each step is done to accurately classify coffee beans and determine their quality. The study tested designs with texture, texture with shape, and texture with color space values (HSV, L*a*b, and RGB). BPNN study routinely outperforms other coffee bean quality assessment methodologies. It uses the PCA feature selection technique to get the best results on GLCM, area-based, and L*a*b features with 98.54% accuracy. Using several scenarios and attributes can improve the variety and quality of this research. ACKNOWLEDGEMENTS The author would like to thank the Faculty of Engineering at Mulawarman University in Samarinda, Indonesia, for providing financial support for the research conducted in 2023 (No. 497/UN17.L1/HK/2023). REFERENCES [1] M. R. Fiona, S. Thomas, I. J. Maria, and B. Hannah, “Identification of ripe and unripe citrus fruits using artificial neural network,” in Journal of Physics: Conference Series, IOP Publishing, 2019, doi: 10.1088/1742-6596/1362/1/012033. [2] S. Munera, F. Hernández, N. Aleixos, S. Cubero, and J. Blasco, “Maturity monitoring of intact fruit and arils of pomegranate cv. ‘Mollar de Elche’ using machine vision and chemometrics,” Postharvest Biology and Technology, vol. 156, 2019, doi: 10.1016/j.postharvbio.2019.110936. [3] Hamdani, A. Septiarini, and D. M. Khairina, “Model assessment of land suitability decision making for oil palm plantation,” in 2016 2nd International Conference on Science in Information Technology, ICSITech 2016: Information Science for Green Society and Environment, IEEE, 2017, pp. 109–113, doi: 10.1109/ICSITech.2016.7852617. [4] A. Yudhana, R. Umar, and F. M. Ayudewi, “The monitoring of corn sprouts growth using the region growing methods,” in Journal of Physics: Conference Series, IOP Publishing, Nov. 2019, doi: 10.1088/1742-6596/1373/1/012054. [5] A. Sezgin and V. Küçük, “Computer science monitoring plant growth with image processing methods and artificial intelligence supported agriculture system,” in International Artificial Intelligenceand Data Processing Symposium, 2022, pp. 165-176, doi: 10.53070/bbd.1172774. [6] B. Açıkalın and N. Sanlier, “Coffee and its effects on the immune system,” Trends in Food Science & Technology, vol. 114, pp. 625–632, Aug. 2021, doi: 10.1016/j.tifs.2021.06.023. [7] E. Piedad, J. I. Larada, G. J. Pojas, and L. V. V Ferrer, “Postharvest classification of banana (Musa acuminata) using tier-based machine learning,” Postharvest Biology and Technology, vol. 145, pp. 93–100, 2018, doi: 10.1016/j.postharvbio.2018.06.004. [8] A. Noviyanto and W. H. Abdulla, “Honey botanical origin classification using hyperspectral imaging and machine learning,” Journal of Food Engineering, vol. 265, 2020, doi: 10.1016/j.jfoodeng.2019.109684. [9] D. Zhang, D. J. Lee, B. J. Tippetts, and K. D. Lillywhite, “Date maturity and quality evaluation using color distribution analysis and back projection,” Journal of Food Engineering, vol. 131, pp. 161–169, 2014, doi: 10.1016/j.jfoodeng.2014.02.002. [10] A. Septiarini, H. Hamdani, T. Hardianti, E. Winarno, S. Suyanto, and E. Irwansyah, “Pixel quantification and color feature extraction on leaf images for oil palm disease identification,” in 7th International Conference on Electrical, Electronics and Information Engineering: Technological Breakthrough for Greater New Life, 2021, pp. 1–5, doi: 10.1109/ICEEIE52663.2021.9616645. [11] W. G. D. Costa, I. D. P. Barbosa, J. E. D. Souza, C. D. Cruz, M. Nascimento, and A. C. B. D. Oliveira, “Machine learning and statistics to qualify environments through multi-traits in Coffea arabica,” PLoS ONE, vol. 16, no. 1, pp. e0245298–e0245298, Jan. 2021, doi: 10.1371/journal.pone.0245298. [12] F. F. L. D. Santos, J. T. F. Rosas, R. N. Martins, G. D. M. Araújo, L. D. A. Viana, and J. D. P. Gonçalves, “Quality assessment of coffee beans through computer vision and machine learning algorithms,” Coffee Science, vol. 15, no. 1, pp. 1–9, 2020, doi: 10.25186/.v15i.1752. [13] A. Septiarini, H. Hamdani, A. Rifani, Z. Arifin, N. Hidayat, and H. Ismanto, “Multi-class support vector machine for arabica coffee bean roasting grade classification,” in ICOIACT 2022 - 5th International Conference on Information and Communications Technology: A New Way to Make AI Useful for Everyone in the New Normal Era, 2022, pp. 407–411, doi: 10.1109/ICOIACT55506.2022.9971897. [14] R. S. El-Sayed and M. N. El-Sayed, “Classification of vehicles’ types using histogram oriented gradients: comparative study and modification,” IAES International Journal of Artificial Intelligence, vol. 9, no. 4, pp. 700–712, 2020, doi: 10.11591/ijai.v9.i4.pp700-712. [15] M. Sharif, M. A. Khan, Z. Iqbal, M. F. Azam, M. I. U. Lali, and M. Y. Javed, “Detection and classification of citrus diseases in agriculture based on optimized weighted segmentation and feature selection,” Computers and Electronics in Agriculture, vol. 150, pp. 220–234, 2018, doi: 10.1016/j.compag.2018.04.023. [16] A. Septiarini, H. Hamdani, S. U. Sari, H. Rahmania Hatta, N. Puspitasari, and W. Hadikurniawati, “Image processing techniques for tomato segmentation applying k-means clustering and edge detection approach,” in 2021 International Seminar on Machine Learning, Optimization, and Data Science, ISMODE 2021, IEEE, 2022, pp. 92–96, doi: 10.1109/ISMODE53584.2022.9742740. [17] J. Lu et al., “Lightweight green citrus fruit detection method for practical environmental applications,” Computers and Electronics in Agriculture, vol. 215, 2023, doi: 10.1016/j.compag.2023.108205. [18] J. Liang, K. Huang, H. Lei, Z. Zhong, Y. Cai, and Z. Jiao, “Occlusion-aware fruit segmentation in complex natural environments under shape prior,” Computers and Electronics in Agriculture, vol. 217, 2024, doi: 10.1016/j.compag.2024.108620. [19] X. Yang, R. Zhang, Z. Zhai, Y. Pang, and Z. Jin, “Machine learning for cultivar classification of apricots (Prunus armeniaca L.) based on shape features,” Scientia Horticulturae, vol. 256, 2019, doi: 10.1016/j.scienta.2019.05.051. [20] H. Li, W. S. Lee, and K. Wang, “Identifying blueberry fruit of different growth stages using natural outdoor color images,” Computers and Electronics in Agriculture, vol. 106, pp. 91–101, 2014, doi: 10.1016/j.compag.2014.05.015. [21] García, C. Becerra, and Hoyos, “Quality and defect inspection of green coffee beans using a computer vision system,” Applied Sciences, vol. 9, no. 19, Oct. 2019, doi: 10.3390/app9194195. [22] H. C. Bazame, J. P. Molin, D. Althoff, and M. Martello, “Detection, classification, and mapping of coffee fruits during harvest with computer vision,” Computers and Electronics in Agriculture, vol. 183, 2021, doi: 10.1016/j.compag.2021.106066. [23] F. G. -Lamont, J. Cervantes, A. López, and L. Rodriguez, “Segmentation of images by color features: A survey,” Neurocomputing, vol. 292, pp. 1–27, 2018, doi: 10.1016/j.neucom.2018.01.091. [24] N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image segmentation using K-means clustering algorithm and subtractive
  • 8.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 4, December 2024: 4241-4248 4248 clustering algorithm,” Procedia Computer Science, vol. 54, pp. 764–771, 2015, doi: 10.1016/j.procs.2015.06.090. [25] A. Septiarini, R. Saputra, A. Tedjawati, M. Wati, and H. Hamdani, “Pattern recognition of sarong fabric using machine learning approach based on computer vision for cultural preservation,” International Journal of Intelligent Engineering and Systems, vol. 15, no. 5, pp. 284–295, 2022, doi: 10.22266/ijies2022.1031.26. BIOGRAPHIES OF AUTHORS Anindita Septiarini is a professor at the Department of Informatics at Mulawarman University, Indonesia. She holds a Doctoral degree in Computer Science from Gadjah Mada University, Indonesia, specializing in image analysis. She is also a researcher and got a grant from the Ministry of Education, Culture, Research, and Technology of Indonesia from 2016 until the present. Her research interests lie in artificial intelligence, especially pattern recognition, image processing, and computer vision. She has received national awards such as scientific article incentives from the Ministry of Education, Culture, Research, and Technology of Indonesia in 2017 and 2019. She held several administrative posts with the Department of Informatics, Mulawarman University, Indonesia, from 2018 to 2020, including the head of department and the head of laboratory. She can be contacted at email: [email protected]. Hamdani Hamdani is a professor in Department of Informatics at Mulawarman University, Indonesia. He is a lecturer and researcher since 2005 at Mulawarman University, Indonesia. His research interests lie in the field of artificial intelligence, especially pattern recognition, decision support system, and expert system. He received her bachelor’s degree in 2002 from Ahmad Dahlan University, Indonesia, his master’s degree in 2009 from Gadjah Mada University, Indonesia, and her doctoral degree in computer science in 2018 from Gadjah Mada University Indonesia. He can be contacted at email: [email protected]. Aji Ery Burhandenny is an assistant professor in Department of Electrical Engineering at Mulawarman University, Indonesia, specializing in Software Engineering. His research interests lie in empirical software engineering particularly understanding human factors in software development activities. He also made several contributions to internet of things, machine learning, and green technologies related topics. He can be contacted at email: [email protected]. Damar Nurcahyono is an assistant professor at The Information Technology Department of the State Polytechnic of Samarinda, Indonesia, specializing in Software Engineering. His research interests lie in software engineering especially games and animation. He has also made several contributions to software development, game development, and environmental technology-related topics. He can be contacted at email: [email protected]. Surya Eka Priyatna is a lecturer at the Department of Information Engineering, Antasari State Islamic University, Indonesia, specializing in mobile application design. His research interests lie in human needs in the use of computer applications. He has also made several contributions on topics related to mobile applications. He can be contacted at email: [email protected].