SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1268
Machine Learning, K-means Algorithm Implementation with R
Mrs. Kavita Ganesh Kurale1, Mrs. Rohini Sudhir Patil2
1Sr. Lecturer, Dept. of Computer Engineering, Sant Gajanan Maharaj Rural Polytechnic, Mahgaon,
Maharashtra,India.
2HOD, Dept. of Computer Engineering, Sant Gajanan Maharaj Rural Polytechnic, Mahgaon, Maharashtra, India.
---------------------------------------------------------------------***----------------------------------------------------------------------
ABSTRACT: Machine learning (ML) is that the growing
technology and scientific study of algorithms that enables
computers to find out automaticallyfrompreviousknowledge.
Machine learning uses numerous algorithms to create
mathematical models and makes predictions exploitation
previous knowledge available. Machine learning is artificial
intelligent application. Machine learning either supervised or
unsupervised learning. K-Means is most typically used
algorithm that is unsupervised learning algorithmic program
used for cluster analysis. During this paper we tend to worked
with the implementation of K-Means and R Programming.
Keywords: Machine Learning, K-means, R
Programming, supervised and unsupervised learning.
I. INTRODUCTION
Machine learning is one ofan applicationofartificial
intelligence (AI) .In Machine learning the systems can
automatically learn and improve the performance by using
previously calculated results Machine learning focuses on
implementation of new computer programs . It can access
data and use this data then learn new things and calculate
results.
The learning can be starts by observing knowledge,
using direct experience on previously used knowledge, or
instruction, and then patterns are calculated inthatdata and
make decisions in the future based on the examples. The
main aim is to allow the computers learn
automatically without manual interference and compute
results based on already computed results.
Some Machine Learning Methods
II. Classes of Machine Learning
1. Supervised Learning
2. Unsupervised Learning
Supervisedmachinelearning algorithms supervised
learning is a learning, we train the machine here some data
is given which is provided with the correct answer. After
that, the machine is provided with a new data set then
machine again forced work on that newly data set so that
supervised learning algorithm analyses the training data
set of training examples and produces a correct outcome
from sorted data. After sufficient training the system
provides targets for any new input. The learning algorithm
can also compare its output with the correct output and find
errors in order to modify the model accordingly.
In opposite to the present, unsupervised machine
learning algorithms are used when the knowledge used to
train is not classified or labeled The system doesn’t figure
out the right output; however it explores the knowledgeand
might draw inferences from datasets to describe hidden
structures from unlabeled data.
Semi-supervised machine learning algorithms fall
nearly in between supervised and unsupervised learning,
since they use both labeled and unlabeled data for training –
generally a small quantum of labeled data and a large
quantum of unlabeled data. The systems that use this
approach are suitable to considerably improve learning
accuracy. Generally, semi-supervised learning is chosen
when the acquired labeled data requires good and relevant
resources in order to train it/ learn from it. Else, acquiring
unlabeled data generally does not require additional
resources.
Reinforcement machine learning algorithms is a
learning approach that interacts with its environment by
producing actions and discovers breaches or rewards. Trial
and error search and delayedrewardarethemostapplicable
characteristics of reinforcement learning. In order to
maximize its performance this approach allows machines
and software agents to automatically determine the ideal
actions within a specific context. Simple price feedback is
needed for the agent to learn which action is stylish; this is
known as the underpinning signal. Machine learning
performs analysis of huge quantities of data. While it
generally delivers faster, more accurate results in order to
identify profitable opportunities or dangerous risks, it
should also require extra time and also resources to train it
properly. Combination of machine learning and AI and
cognitive technologies can turn it intoeffectivein processing
of large volumes of information.
III. Clustering
Clustering is the most popular approach in
unsupervised learning where data is grouped based on
the similarity of the data- points. Clustering has
numerous real- life usages where it can be used in a
variety of situations. Clustering is used in colorful fields
like image recognition, pattern analysis, medical
informatics, genomics, data compression etc. in machine
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1269
learning this is part of the unsupervised learning
algorithm. This is because the data- points present aren't
labeled and there's no explicit mapping of input and
outputs.
IV. K- MEANs Clustering
K-Means Clustering K- means is one of the
simplest unsupervised learning algorithms that answer
the well- known clustering problem. The procedure
follows an easy and straightforward Method to classify a
given data set through a particular number of clusters.
The main thing here is for every cluster; define k centers,
one for each. These centers should be placed during
a cunning way due to different position causes different
result. So, the better choice is to place them is important
as possible far away from each other. Figure below K-
Means Clustering. The next step is to take each data point
from a given data set and assign it to the nearest center.
When all data point completed, the primary step is
completed and an early group age is completed. At this
point we need tore-calculate k new centroids of the
clusters resulting from the previous step. (1)
The KMeans algorithm is very simple [3]:
1. Select the value of Initial centroids i.e. K.
2. Repeat step no 3 and step no 4 for all data points in
given dataset.
3. Find the closest data point from those centroids in
the Dataset.
4. Form K cluster. Clusters are formed by assigning
each point to its nearest centroid.
5. For each cluster in data set new global centroid are
computed.
K-means algorithm Properties[3]:
1. Efficient while processing large data set.
2. It works only on number values.
3. The clusters shape is convex.
Objective of the K-means
The objective of the K-means clustering is to
minimize the Euclideandistance that eachpointhasfromthe
centroid of the cluster.
Euclidean distance Formula
V. Implementation of K-Means with R-Programming
Step 1: Generation of Data
Here some random data is generated. Two vectors
are defined vector1 and vector2 and create a 2-D array
named data points which defines data points i.e. (x,y)
coordinate pairs.
> vector1 <- c(1, 1.5, 2, 2.5,3, 3.5, 4,4.5)
> vector2 <- c(1, 2, 3, 4,5,6,7,8)
> datapoints<-array(c(vector1,vector2), dim = c(8,2))
> print(datapoints)
The data Points defined here isa 2-Darray.Thefirst
column indicated the X coordinates, and the second column
represent Y-coordinates.
It is defined as shown below:
[,1] [,2]
[1,] 1.0 1
[2,] 1.5 2
[3,] 2.0 3
[4,] 2.5 4
[5,] 3.0 5
[6,] 3.5 6
[7,] 4.0 7
[8,] 4.5 8
Now in following diagram we plotted the data
points and visualize them using the plot function in R
programming. The output is shown as below:
>plot(datapoints)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1270
Step 2: Initiate Random Centroids for k-Clusters
We will initialize 2 clusters with 2 centroids (1.5, 2) and
(3,5).
> k=2
> c1=c(1.5,2)
> c2=c(3,5)
> centroid=array(c(1.5,2,3,5), dim= c(k,2))
> print(centroid)
We define the k =2 number of clusters. An array of
two co-ordinate pairs is the centroids. the two clusters is
shown below is the array centroid containing the
coordinates :
[,1] [,2]
[1,] 1.5 3
[2,] 2.0 5
Using the plot function , We will plotthedata points
and the initial centroids on the same plot. We use
the points function to specify the centroids,.
The points function is used to highlight points of interest
using different colors. Centroids are represented using the
color red.
> plot(datapoints[,1], datapoints[,2])
>points(centroid[,1], centroid[,2], col="red")
Step 3: From each point Distance Calculation
Distance between the centroid and the remaining points are
calculated using Euclidean distance formula. The Euclidean
distance is defined as follows:
We will use the above equation above in the
following sub-section. Here we are calculated the Euclidean
distance formula in three steps.
Calculate the distance between thecorrespondingX
and Y coordinates of the data-points and the centroid.
Calculate the sum of the square of the differences computed
in Step 1.
Find the square root of the sum of squares ofdifferences
which is calculated in Step 2.
Difference: datapointi–centroid
dist_frm_clst1<-(datapoints[,]- centroid[1,])^2
> dist_frm_clst1=sqrt(dist_frm_clst1[,1]+ dist_frm_clst1[,2])
> dist_frm_clst1
[1] 0.7071068 1.8027756 1.5811388 1.11803403.8078866
3.0413813 6.0415230 [8] 5.2201533
Square of difference: (datapointi–centroid)2
>dist_frm_clst2=(datapoints[,]-centroid[2,])^2
Addition and Square root:
>dist_frm_clst2=sqrt(dist_frm_clst2[,1]+ dist_frm_clst2[,2])
> dist_frm_clst2
[1] 1.414214 4.609772 1.000000 2.692582 3.162278
1.802776 5.385165 3.041381
Here the dist_frm_clst1 is the distancewhichisbetween each
point and the centroid-1. Likewise , we calculate the
distances for centroid-2.
tot_dist=array(c(dist_frm_clst1,dist_frm_clst2),dim=c(8,2))
> tot_dist
[,1] [,2]
[1,] 0.7071068 1.414214
[2,] 1.8027756 4.609772
[3,] 1.5811388 1.000000
[4,] 1.1180340 2.692582
[5,] 3.8078866 3.162278
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1271
[6,] 3.0413813 1.802776
[7,] 6.0415230 5.385165
[8,] 5.2201533 3.041381
Step 4: Compare, finalize the Closest Centroids
Let’s create a logical comparing
vector dist_frm_clst_1 and dis_frm_clst2. This vector will be
made up of the Boolean values TRUE and FALSE. For
example create this vector using a conditional statement.
We write the condition as follows: distance to the first
cluster is less than the second cluster’s distance. Points here
that satisfy given condition belong to cluster 1. The
remaining points are belongs to cluster 2.
c(tot_dist[,1]<= tot_dist[,2])
[1] TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
Using the logical vector above, we obtain the
elements of the first cluster. The operation used below is an
example of conditional selection. Elements that satisfy this
condition in the array dataPoints are printed.
datapoints[,1][c(tot_dist[,1] <= tot_dist[,2])]
[1] 1.0 1.5 2.5
To find the centroid of the newly formed cluster, we
take the mean of all the points obtained above. The thinking
is as follows: We need to find a point closest to all the cluster
data points. Therefore, averaging the data points results ina
point closest to the remaining points.
>mean(datapoints[,1][c(tot_dist[,1] <= tot_dist[,2])])
We calculate the mean using the R
function mean. This is an exampleofhowweselect elements
conditionally that belongs to a cluster and how we find its
centroid.
[1] 1.666667
c1 = c(mean(datapoints[,1][c(tot_dist[,1] <=
tot_dist[,2])]),mean(datapoints[,2][c(tot_dist[,1] <=
tot_dist[,2])]))
We compute the X and Y coordinates of thecentroid
using the code above. We store the X coordinate in c1 and y-
coordinates in c2. We copy the data in these lists to a new
array called new_centroid.
> new_centroid[1,] = c1
> new_centroid[2,] = c2
The new_centroid contains the updated centroid of
the formed clusters. Therefore, we have implemented the
algorithm successfully.
> new_centroid
[,1] [,2]
[1,] 1.666667 2.333333
[2,] 3.400000 5.800000
Let’s plot the new centroids using the following
code:
plot(datapoints[,1], datapoints[,2])
> points(centroid[,1],centroid[,2],col="red")
>points(new_centroid[,1],new_centroid[,2],col="green")
The old and updated centroids are shown in the
figure below.
VI. CONCLUSION
Kmeans clustering is one of the most popular and
widely used clustering algorithms, usually the apply
when solving clustering tasks to get an idea of the
structure of the dataset. The main aim of kmeans
algorithm is to group data points into distinct non-
overlapping subgroups such that single group contain
same type of data item. Here we implemented Kmeans
algorithm using r-programming and computed new
global centroid for clusters successfully. Data is
generated using vector in r and Euclidean distance
formula is used for distance calculation. We calculated
distance using mean function in r and new centroid
plotted on graph. Hence we followed all K means
algorithm steps for centroid computation
REFERENCES
[1] International Journal of Pure and Applied Mathematics
Volume 117 No. 7 2017, 157-164 ISSN: 1311-8080 (printed
version); ISSN: 1314-3395 (on-line version) url:
http://www.ijpam.eu Special Issue “A k-means Clustering
Algorithm on Numeric Data”
[2] International Journal of Information & Computation
Technology. ISSN 0974-2239 Volume 4, Number 17 (2014),
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1272
pp. 1847-1860 © International ResearchPublicationsHouse
http://www. Irphouse.com A Review ON K-means DATA
Clustering APPROACH
[3] 2017 6th International Conference on Reliability,
Infocom Technologies and Optimization (ICRITO) (Trends
and Future Directions), Sep. 20-22, 2017, AIIT, Amity
University Uttar Pradesh, Noida, India “A Detailed Study of
Clustering Algorithms”
[4] https://data-flair.training/blogs/using-r-for-data-
science/
Ad

Recommended

Machine Learning, Statistics And Data Mining
Machine Learning, Statistics And Data Mining
Jason J Pulikkottil
 
Neural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
 
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
vigneshmatta2004
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
SureshPolisetty2
 
Machine learning hands on clustering
Machine learning hands on clustering
Dr. Dragos Crintea
 
Intro to machine learning
Intro to machine learning
Akshay Kanchan
 
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PalanivelG6
 
5. Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
3.Unsupervised Learning.ppt presenting machine learning
3.Unsupervised Learning.ppt presenting machine learning
PriyankaRamavath3
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )
Sunil OS
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
 
Machine Learning Approach.pptx
Machine Learning Approach.pptx
CYPatrickKwee
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
Lec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Supervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Presentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptx
SYETB202RandhirBhosa
 
Introduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
SudhanshiBakre1
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
rahuljain582793
 
Machine Learning techniques used in AI.
Machine Learning techniques used in AI.
ArchanaT32
 
Introduction MAchine Learning . Machine Learning is trendy concept
Introduction MAchine Learning . Machine Learning is trendy concept
KiranMittal7
 
Unsupervised Learning Clustering - Mathematcis
Unsupervised Learning Clustering - Mathematcis
igfreeze7
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
IOSR Journals
 
demo lecture for foundation class for btech
demo lecture for foundation class for btech
ROHIT738213
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
Dr.Shweta
 
Machine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 

More Related Content

Similar to Machine Learning, K-means Algorithm Implementation with R (20)

5. Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
3.Unsupervised Learning.ppt presenting machine learning
3.Unsupervised Learning.ppt presenting machine learning
PriyankaRamavath3
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )
Sunil OS
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
 
Machine Learning Approach.pptx
Machine Learning Approach.pptx
CYPatrickKwee
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
Lec13 Clustering.pptx
Lec13 Clustering.pptx
Khalid Rabayah
 
Supervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Presentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptx
SYETB202RandhirBhosa
 
Introduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
SudhanshiBakre1
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
rahuljain582793
 
Machine Learning techniques used in AI.
Machine Learning techniques used in AI.
ArchanaT32
 
Introduction MAchine Learning . Machine Learning is trendy concept
Introduction MAchine Learning . Machine Learning is trendy concept
KiranMittal7
 
Unsupervised Learning Clustering - Mathematcis
Unsupervised Learning Clustering - Mathematcis
igfreeze7
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
IOSR Journals
 
demo lecture for foundation class for btech
demo lecture for foundation class for btech
ROHIT738213
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
Dr.Shweta
 
Machine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
5. Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
Jeet Das
 
3.Unsupervised Learning.ppt presenting machine learning
3.Unsupervised Learning.ppt presenting machine learning
PriyankaRamavath3
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )
Sunil OS
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Experfy
 
Machine Learning Approach.pptx
Machine Learning Approach.pptx
CYPatrickKwee
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
Supervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Presentation Template__TY_AIML_IE2_Project (1).pptx
Presentation Template__TY_AIML_IE2_Project (1).pptx
SYETB202RandhirBhosa
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
SudhanshiBakre1
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
rahuljain582793
 
Machine Learning techniques used in AI.
Machine Learning techniques used in AI.
ArchanaT32
 
Introduction MAchine Learning . Machine Learning is trendy concept
Introduction MAchine Learning . Machine Learning is trendy concept
KiranMittal7
 
Unsupervised Learning Clustering - Mathematcis
Unsupervised Learning Clustering - Mathematcis
igfreeze7
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
IOSR Journals
 
demo lecture for foundation class for btech
demo lecture for foundation class for btech
ROHIT738213
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
Dr.Shweta
 

More from IRJET Journal (20)

Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Mechanical Vibration_MIC 202_iit roorkee.pdf
Mechanical Vibration_MIC 202_iit roorkee.pdf
isahiliitr
 
Complete University of Calculus :: 2nd edition
Complete University of Calculus :: 2nd edition
Shabista Imam
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Fundamentals of Digital Design_Class_12th April.pptx
Fundamentals of Digital Design_Class_12th April.pptx
drdebarshi1993
 
Fundamentals of Digital Design_Class_21st May - Copy.pptx
Fundamentals of Digital Design_Class_21st May - Copy.pptx
drdebarshi1993
 
ElysiumPro Company Profile 2025-2026.pdf
ElysiumPro Company Profile 2025-2026.pdf
info751436
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
DESIGN OF REINFORCED CONCRETE ELEMENTS S
DESIGN OF REINFORCED CONCRETE ELEMENTS S
prabhusp8
 
Fatality due to Falls at Working at Height
Fatality due to Falls at Working at Height
ssuserb8994f
 
Stay Safe Women Security Android App Project Report.pdf
Stay Safe Women Security Android App Project Report.pdf
Kamal Acharya
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Mechanical Vibration_MIC 202_iit roorkee.pdf
Mechanical Vibration_MIC 202_iit roorkee.pdf
isahiliitr
 
Complete University of Calculus :: 2nd edition
Complete University of Calculus :: 2nd edition
Shabista Imam
 
retina_biometrics ruet rajshahi bangdesh.pptx
retina_biometrics ruet rajshahi bangdesh.pptx
MdRakibulIslam697135
 
NEW Strengthened Senior High School Gen Math.pptx
NEW Strengthened Senior High School Gen Math.pptx
DaryllWhere
 
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
Decoding Kotlin - Your Guide to Solving the Mysterious in Kotlin - Devoxx PL ...
João Esperancinha
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
Introduction to Python Programming Language
Introduction to Python Programming Language
merlinjohnsy
 
Machine Learning - Classification Algorithms
Machine Learning - Classification Algorithms
resming1
 
Fundamentals of Digital Design_Class_12th April.pptx
Fundamentals of Digital Design_Class_12th April.pptx
drdebarshi1993
 
Fundamentals of Digital Design_Class_21st May - Copy.pptx
Fundamentals of Digital Design_Class_21st May - Copy.pptx
drdebarshi1993
 
ElysiumPro Company Profile 2025-2026.pdf
ElysiumPro Company Profile 2025-2026.pdf
info751436
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
DESIGN OF REINFORCED CONCRETE ELEMENTS S
DESIGN OF REINFORCED CONCRETE ELEMENTS S
prabhusp8
 
Fatality due to Falls at Working at Height
Fatality due to Falls at Working at Height
ssuserb8994f
 
Stay Safe Women Security Android App Project Report.pdf
Stay Safe Women Security Android App Project Report.pdf
Kamal Acharya
 
special_edition_using_visual_foxpro_6.pdf
special_edition_using_visual_foxpro_6.pdf
Shabista Imam
 
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
362 Alec Data Center Solutions-Slysium Data Center-AUH-Adaptaflex.pdf
djiceramil
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Ad

Machine Learning, K-means Algorithm Implementation with R

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1268 Machine Learning, K-means Algorithm Implementation with R Mrs. Kavita Ganesh Kurale1, Mrs. Rohini Sudhir Patil2 1Sr. Lecturer, Dept. of Computer Engineering, Sant Gajanan Maharaj Rural Polytechnic, Mahgaon, Maharashtra,India. 2HOD, Dept. of Computer Engineering, Sant Gajanan Maharaj Rural Polytechnic, Mahgaon, Maharashtra, India. ---------------------------------------------------------------------***---------------------------------------------------------------------- ABSTRACT: Machine learning (ML) is that the growing technology and scientific study of algorithms that enables computers to find out automaticallyfrompreviousknowledge. Machine learning uses numerous algorithms to create mathematical models and makes predictions exploitation previous knowledge available. Machine learning is artificial intelligent application. Machine learning either supervised or unsupervised learning. K-Means is most typically used algorithm that is unsupervised learning algorithmic program used for cluster analysis. During this paper we tend to worked with the implementation of K-Means and R Programming. Keywords: Machine Learning, K-means, R Programming, supervised and unsupervised learning. I. INTRODUCTION Machine learning is one ofan applicationofartificial intelligence (AI) .In Machine learning the systems can automatically learn and improve the performance by using previously calculated results Machine learning focuses on implementation of new computer programs . It can access data and use this data then learn new things and calculate results. The learning can be starts by observing knowledge, using direct experience on previously used knowledge, or instruction, and then patterns are calculated inthatdata and make decisions in the future based on the examples. The main aim is to allow the computers learn automatically without manual interference and compute results based on already computed results. Some Machine Learning Methods II. Classes of Machine Learning 1. Supervised Learning 2. Unsupervised Learning Supervisedmachinelearning algorithms supervised learning is a learning, we train the machine here some data is given which is provided with the correct answer. After that, the machine is provided with a new data set then machine again forced work on that newly data set so that supervised learning algorithm analyses the training data set of training examples and produces a correct outcome from sorted data. After sufficient training the system provides targets for any new input. The learning algorithm can also compare its output with the correct output and find errors in order to modify the model accordingly. In opposite to the present, unsupervised machine learning algorithms are used when the knowledge used to train is not classified or labeled The system doesn’t figure out the right output; however it explores the knowledgeand might draw inferences from datasets to describe hidden structures from unlabeled data. Semi-supervised machine learning algorithms fall nearly in between supervised and unsupervised learning, since they use both labeled and unlabeled data for training – generally a small quantum of labeled data and a large quantum of unlabeled data. The systems that use this approach are suitable to considerably improve learning accuracy. Generally, semi-supervised learning is chosen when the acquired labeled data requires good and relevant resources in order to train it/ learn from it. Else, acquiring unlabeled data generally does not require additional resources. Reinforcement machine learning algorithms is a learning approach that interacts with its environment by producing actions and discovers breaches or rewards. Trial and error search and delayedrewardarethemostapplicable characteristics of reinforcement learning. In order to maximize its performance this approach allows machines and software agents to automatically determine the ideal actions within a specific context. Simple price feedback is needed for the agent to learn which action is stylish; this is known as the underpinning signal. Machine learning performs analysis of huge quantities of data. While it generally delivers faster, more accurate results in order to identify profitable opportunities or dangerous risks, it should also require extra time and also resources to train it properly. Combination of machine learning and AI and cognitive technologies can turn it intoeffectivein processing of large volumes of information. III. Clustering Clustering is the most popular approach in unsupervised learning where data is grouped based on the similarity of the data- points. Clustering has numerous real- life usages where it can be used in a variety of situations. Clustering is used in colorful fields like image recognition, pattern analysis, medical informatics, genomics, data compression etc. in machine
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1269 learning this is part of the unsupervised learning algorithm. This is because the data- points present aren't labeled and there's no explicit mapping of input and outputs. IV. K- MEANs Clustering K-Means Clustering K- means is one of the simplest unsupervised learning algorithms that answer the well- known clustering problem. The procedure follows an easy and straightforward Method to classify a given data set through a particular number of clusters. The main thing here is for every cluster; define k centers, one for each. These centers should be placed during a cunning way due to different position causes different result. So, the better choice is to place them is important as possible far away from each other. Figure below K- Means Clustering. The next step is to take each data point from a given data set and assign it to the nearest center. When all data point completed, the primary step is completed and an early group age is completed. At this point we need tore-calculate k new centroids of the clusters resulting from the previous step. (1) The KMeans algorithm is very simple [3]: 1. Select the value of Initial centroids i.e. K. 2. Repeat step no 3 and step no 4 for all data points in given dataset. 3. Find the closest data point from those centroids in the Dataset. 4. Form K cluster. Clusters are formed by assigning each point to its nearest centroid. 5. For each cluster in data set new global centroid are computed. K-means algorithm Properties[3]: 1. Efficient while processing large data set. 2. It works only on number values. 3. The clusters shape is convex. Objective of the K-means The objective of the K-means clustering is to minimize the Euclideandistance that eachpointhasfromthe centroid of the cluster. Euclidean distance Formula V. Implementation of K-Means with R-Programming Step 1: Generation of Data Here some random data is generated. Two vectors are defined vector1 and vector2 and create a 2-D array named data points which defines data points i.e. (x,y) coordinate pairs. > vector1 <- c(1, 1.5, 2, 2.5,3, 3.5, 4,4.5) > vector2 <- c(1, 2, 3, 4,5,6,7,8) > datapoints<-array(c(vector1,vector2), dim = c(8,2)) > print(datapoints) The data Points defined here isa 2-Darray.Thefirst column indicated the X coordinates, and the second column represent Y-coordinates. It is defined as shown below: [,1] [,2] [1,] 1.0 1 [2,] 1.5 2 [3,] 2.0 3 [4,] 2.5 4 [5,] 3.0 5 [6,] 3.5 6 [7,] 4.0 7 [8,] 4.5 8 Now in following diagram we plotted the data points and visualize them using the plot function in R programming. The output is shown as below: >plot(datapoints)
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1270 Step 2: Initiate Random Centroids for k-Clusters We will initialize 2 clusters with 2 centroids (1.5, 2) and (3,5). > k=2 > c1=c(1.5,2) > c2=c(3,5) > centroid=array(c(1.5,2,3,5), dim= c(k,2)) > print(centroid) We define the k =2 number of clusters. An array of two co-ordinate pairs is the centroids. the two clusters is shown below is the array centroid containing the coordinates : [,1] [,2] [1,] 1.5 3 [2,] 2.0 5 Using the plot function , We will plotthedata points and the initial centroids on the same plot. We use the points function to specify the centroids,. The points function is used to highlight points of interest using different colors. Centroids are represented using the color red. > plot(datapoints[,1], datapoints[,2]) >points(centroid[,1], centroid[,2], col="red") Step 3: From each point Distance Calculation Distance between the centroid and the remaining points are calculated using Euclidean distance formula. The Euclidean distance is defined as follows: We will use the above equation above in the following sub-section. Here we are calculated the Euclidean distance formula in three steps. Calculate the distance between thecorrespondingX and Y coordinates of the data-points and the centroid. Calculate the sum of the square of the differences computed in Step 1. Find the square root of the sum of squares ofdifferences which is calculated in Step 2. Difference: datapointi–centroid dist_frm_clst1<-(datapoints[,]- centroid[1,])^2 > dist_frm_clst1=sqrt(dist_frm_clst1[,1]+ dist_frm_clst1[,2]) > dist_frm_clst1 [1] 0.7071068 1.8027756 1.5811388 1.11803403.8078866 3.0413813 6.0415230 [8] 5.2201533 Square of difference: (datapointi–centroid)2 >dist_frm_clst2=(datapoints[,]-centroid[2,])^2 Addition and Square root: >dist_frm_clst2=sqrt(dist_frm_clst2[,1]+ dist_frm_clst2[,2]) > dist_frm_clst2 [1] 1.414214 4.609772 1.000000 2.692582 3.162278 1.802776 5.385165 3.041381 Here the dist_frm_clst1 is the distancewhichisbetween each point and the centroid-1. Likewise , we calculate the distances for centroid-2. tot_dist=array(c(dist_frm_clst1,dist_frm_clst2),dim=c(8,2)) > tot_dist [,1] [,2] [1,] 0.7071068 1.414214 [2,] 1.8027756 4.609772 [3,] 1.5811388 1.000000 [4,] 1.1180340 2.692582 [5,] 3.8078866 3.162278
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1271 [6,] 3.0413813 1.802776 [7,] 6.0415230 5.385165 [8,] 5.2201533 3.041381 Step 4: Compare, finalize the Closest Centroids Let’s create a logical comparing vector dist_frm_clst_1 and dis_frm_clst2. This vector will be made up of the Boolean values TRUE and FALSE. For example create this vector using a conditional statement. We write the condition as follows: distance to the first cluster is less than the second cluster’s distance. Points here that satisfy given condition belong to cluster 1. The remaining points are belongs to cluster 2. c(tot_dist[,1]<= tot_dist[,2]) [1] TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE Using the logical vector above, we obtain the elements of the first cluster. The operation used below is an example of conditional selection. Elements that satisfy this condition in the array dataPoints are printed. datapoints[,1][c(tot_dist[,1] <= tot_dist[,2])] [1] 1.0 1.5 2.5 To find the centroid of the newly formed cluster, we take the mean of all the points obtained above. The thinking is as follows: We need to find a point closest to all the cluster data points. Therefore, averaging the data points results ina point closest to the remaining points. >mean(datapoints[,1][c(tot_dist[,1] <= tot_dist[,2])]) We calculate the mean using the R function mean. This is an exampleofhowweselect elements conditionally that belongs to a cluster and how we find its centroid. [1] 1.666667 c1 = c(mean(datapoints[,1][c(tot_dist[,1] <= tot_dist[,2])]),mean(datapoints[,2][c(tot_dist[,1] <= tot_dist[,2])])) We compute the X and Y coordinates of thecentroid using the code above. We store the X coordinate in c1 and y- coordinates in c2. We copy the data in these lists to a new array called new_centroid. > new_centroid[1,] = c1 > new_centroid[2,] = c2 The new_centroid contains the updated centroid of the formed clusters. Therefore, we have implemented the algorithm successfully. > new_centroid [,1] [,2] [1,] 1.666667 2.333333 [2,] 3.400000 5.800000 Let’s plot the new centroids using the following code: plot(datapoints[,1], datapoints[,2]) > points(centroid[,1],centroid[,2],col="red") >points(new_centroid[,1],new_centroid[,2],col="green") The old and updated centroids are shown in the figure below. VI. CONCLUSION Kmeans clustering is one of the most popular and widely used clustering algorithms, usually the apply when solving clustering tasks to get an idea of the structure of the dataset. The main aim of kmeans algorithm is to group data points into distinct non- overlapping subgroups such that single group contain same type of data item. Here we implemented Kmeans algorithm using r-programming and computed new global centroid for clusters successfully. Data is generated using vector in r and Euclidean distance formula is used for distance calculation. We calculated distance using mean function in r and new centroid plotted on graph. Hence we followed all K means algorithm steps for centroid computation REFERENCES [1] International Journal of Pure and Applied Mathematics Volume 117 No. 7 2017, 157-164 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue “A k-means Clustering Algorithm on Numeric Data” [2] International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014),
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1272 pp. 1847-1860 © International ResearchPublicationsHouse http://www. Irphouse.com A Review ON K-means DATA Clustering APPROACH [3] 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Sep. 20-22, 2017, AIIT, Amity University Uttar Pradesh, Noida, India “A Detailed Study of Clustering Algorithms” [4] https://data-flair.training/blogs/using-r-for-data- science/