How To Annotate Clusters with Circle/Ellipse by a Variable in R ggplot2
Last Updated :
24 Mar, 2022
In this article, we will discuss how to annotate Clusters with Circle/Ellipse by a categorical variable in the R Programming Language using the ggplot2 package.
To add a circle or ellipse around a cluster of data points, we use the geom_mark_circle() and geom_mark_ellipse() function of the ggforce package. This function automatically computes the circle/ellipse radius to draw around the cluster of points by categorical data.
First, we will plot the data in a scatter plot using the geom_point function of the ggplot2 package. We will use the color parameter of the aes() function to color the plot by a categorical variable group.
Syntax:
ggplot(df, aes( x, y ) ) + geom_point( aes( color ))
Arguments:
- df: determines the data frame to be used.
- x and y: determine the x-axis and y-axis variables respectively.
- color: determines the categorical variable for coloring the data point clusters.
Example:
Here, is a basic scatter plot made using the geom_point() function of the ggplot2 package. We have colored the plot by the categorical variable group.
R
# load library tidyverse
library(tidyverse)
# set theme
theme_set(theme_bw(16))
# create x and y vector
xAxis <- rnorm(1000)
yAxis <- rnorm(1000) + xAxis + 10
# create groups in variable using conditional
# statements
group <- rep(1, 1000)
group[xAxis > -1.5] <- 2
group[xAxis > -0.5] <- 3
group[xAxis > 0.5] <- 4
group[xAxis > 1.5] <- 5
# create sample data frame
sample_data <- data.frame(xAxis, yAxis, group)
# create a scatter plot with points colored by
# group
ggplot(sample_data, aes(x = xAxis,
y = yAxis))+
geom_point(aes(color = as.factor(group)))
Output:

Annotate circles around cluster:
To annotate a circle around a cluster of points by the group we use the geom_mark_circle() function of the ggforce package. To use this function we first install & import the ggforce package by using:
install. packages('ggforce')
library(ggforce)
Now, we will annotate the circle around a cluster of data points by using the geom_mark_circle() function.
Syntax:
ggplot(df, aes( x, y ) ) + geom_point( aes( color )) + geom_mark_circle( aes(color) )
Example:
Here, is a basic scatter plot with circles around a cluster of data points colored by a categorical variable group.
R
# load library tidyverse
library(tidyverse)
library(ggforce)
# set theme
theme_set(theme_bw(16))
# create x and y vector
xAxis <- rnorm(500)
yAxis <- rnorm(1000) + xAxis + 10
# create groups in variable using conditional
# statements
group <- rep(1, 500)
group[xAxis > -1.5] <- 2
group[xAxis > -0.5] <- 3
group[xAxis > 0.5] <- 4
group[xAxis > 1.5] <- 5
# create sample data frame
sample_data <- data.frame(xAxis, yAxis, group)
# create a scatter plot with points colored by group
# circles are annotated using geom_mark_circle() function
ggplot(sample_data, aes(x = xAxis,
y = yAxis))+
geom_point(aes(color = as.factor(group)))+
geom_mark_circle(aes(color = as.factor(group)), expand = unit(0.5,"mm"))+
theme(legend.position = "none")
Output:

Annotate ellipses around cluster:
To annotate an ellipse around a cluster of points by the group we use the geom_mark_ellipse() function of the ggforce package. This function automatically computes the dimensions of the ellipse and overlays it on top of the scatter plot.
Syntax:
ggplot(df, aes( x, y ) ) + geom_point( aes( color )) + geom_mark_ellipse( aes(color) )
Example:
Here, is a basic scatter plot with ellipses around a cluster of data points colored by a categorical variable group.
R
# load library tidyverse
library(tidyverse)
library(ggforce)
# set theme
theme_set(theme_bw(16))
# create x and y vector
xAxis <- rnorm(500)
yAxis <- rnorm(1000) + xAxis + 10
# create groups in variable using conditional
# statements
group <- rep(1, 500)
group[xAxis > -1.5] <- 2
group[xAxis > -0.5] <- 3
group[xAxis > 0.5] <- 4
group[xAxis > 1.5] <- 5
# create sample data frame
sample_data <- data.frame(xAxis, yAxis, group)
# create a scatter plot with points colored by group
# ellipses are annotated using geom_mark_ellipse() function
ggplot(sample_data, aes(x = xAxis,
y = yAxis))+
geom_point(aes(color = as.factor(group)))+
geom_mark_ellipse(aes(color = as.factor(group)), expand = unit(0.5,"mm"))+
theme(legend.position = "none")
Output:

Customizing the aesthetics
We can customize the aesthetics of the geom_mark_* function by using the color, fill, and alpha property of the aes() function.Â
Syntax:
ggplot(df, aes( x, y ) ) + geom_point( aes( color )) + geom_mark_ellipse( aes(color, fill, alpha) )
where,
- color: determines the color of the boundary of the circles or ellipses.
- fill: determines the background color of the circles or ellipses.
- alpha: determines the transparency of the circles or ellipses.
Example:
In this example, we will plot a  scatter plot overlayed by ellipses with a background colored by the group categorical variable.
R
# load library tidyverse
library(tidyverse)
library(ggforce)
# set theme
theme_set(theme_bw(16))
# create x and y vector
xAxis <- rnorm(500)
yAxis <- rnorm(1000) + xAxis + 10
# create groups in variable using conditional
# statements
group <- rep(1, 500)
group[xAxis > -1.5] <- 2
group[xAxis > -0.5] <- 3
group[xAxis > 0.5] <- 4
group[xAxis > 1.5] <- 5
# create sample data frame
sample_data <- data.frame(xAxis, yAxis, group)
# create a scatter plot with points colored by group
# ellipses are annotated using geom_mark_ellipse() function
ggplot(sample_data, aes(x = xAxis,
y = yAxis))+
geom_point(aes(color = as.factor(group)))+
geom_mark_ellipse(aes(fill = as.factor(group)), expand = unit(0.5,"mm"))+
theme(legend.position = "none")
Output:

Similar Reads
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response
In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read
Backpropagation in Neural Network
Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Polymorphism in Java
Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
3-Phase Inverter
An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
What is Vacuum Circuit Breaker?
A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
AVL Tree Data Structure
An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one. The absolute difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of
4 min read
What is a Neural Network?
Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns, and enable tasks such as pattern recognition and decision-making.In this article, we will explore the fundamenta
14 min read