Data Visualisation using ggplot2(Scatter Plots)
Last Updated :
24 Apr, 2025
The correlation Scatter Plot is a crucial tool in data visualization and helps to identify the relationship between two continuous variables. In this article, we will discuss how to create a Correlation Scatter Plot using ggplot2 in R. The ggplot2 library is a popular library used for creating beautiful and informative data visualizations in R Programming Language.
- Scatter Plot: A scatter plot is a graphical representation of the relationship between two variables, where each observation is represented by a point on a 2D plane.
- Correlation: Correlation is a measure of the linear association between two variables. The correlation coefficient can range from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.
- ggplot2: ggplot2 is a widely used data visualization library in R. It provides a simple and intuitive syntax for creating complex visualizations.
- Load the ggplot2 library: Before creating a Correlation Scatter Plot, you need to load the ggplot2 library by using the following command: "library(ggplot2)".
- Prepare the data: You need to prepare the data that you want to visualize in the form of a data frame. The data should contain two columns, representing the two variables that you want to visualize.
Basic correlation Scatter Plot using ggplot2:
The first we'll do is load the necessary packages and create a sample dataset. For the below example, we'll use the default mtcars dataset that contains information on various car models and their specifications.
R
library(ggplot2)
# Create a sample dataset
data(mtcars)
df <- mtcars[, c("mpg", "wt")]
The next thing we'll do is use ggplot() function that creates a plot object and will use the geom_point() function to add points to the plot with mpg on the x-axis and wt on the y-axis:
R
# Create a basic scatter plot
ggplot(df, aes(x = mpg, y = wt)) +
geom_point()
Output:
Scatter plot using ggplot2
It is often useful to add a regression line to plot for the visualization of the overall trend in data. For doing this we can use the geom_smooth() function:
R
# Add a regression line
ggplot(df, aes(x = mpg, y = wt)) +
geom_point() +
geom_smooth(method = "lm")
Output:
This above snippet will add a regression line to the plot using the linear regression method. Here's another example of a correlation scatter plot using the ggplot2 package. For this example, we'll use the iris dataset that contains information on various iris flowers and their petal and sepal dimensions.
R
# Create a sample dataset
data(iris)
df <- iris[, c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width",
"Species")]
Then, we'll use the ggplot() function to create a plot object, and the geom_point() function to add points to the plot with Sepal.Length on the x-axis and Petal.Length on the y-axis. We'll also use the aes() function to map the color of points to different Species of iris flowers.
R
# Create a scatter plot with
# color mapped to Species
ggplot(df, aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point()
Output:
Now, to add a regression line to the plot, we would use the geom_smooth() function with the method argument set to "lm" for linear regression:
R
# Add a linear regression line
ggplot(df, aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point() +
geom_smooth(method = "lm")
Output:
To further customize the plot, what we can do is use the facet_wrap() function to create separate plots for each Species of iris flower:
R
# Create a scatter plot with color
# mapped to Species, faceted by Species
ggplot(df, aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~Species, ncol = 2)
Output:
In conclusion to this example, we loaded the ggplot2 package, created a sample dataset, and used ggplot() to initialize a plot object anc then used the geom_point() to add points to the plot with the color of the points mapped to the different Species using the aes() function. Then, added a regression line to the plot using the geom_smooth() function with the method argument set to "lm" for linear regression. Finally used the facet_wrap() to create separate plots for each Species and specified the number of columns using the ncol argument.
Scatter Plot of MPG dataset using the ggplot2 function
As we know we'll load the necessary packages and create a sample dataset first. For this example we are going to use the mpg dataset that contains information on various cars and their fuel economy:
R
library(ggplot2)
# Create a sample dataset
data(mpg)
df <- mpg[, c("displ", "hwy", "cyl", "class")]
Next, we'll use the ggplot() function to create a plot object and the aes() function to map the displ column to the x-axis and hwy column to the y-axis. And also use the geom_point() function to add points to the plot with the color of the points mapped to cyl column and the shape of the points mapped to the class column. We'll be using the scale_shape_manual() and scale_color_manual() functions to manually set the shapes and colors of the points.
R
# Create a scatter plot with color
# and shape mapped to cyl and class
ggplot(df, aes(x = displ, y = hwy,
color = factor(cyl),
shape = factor(class)))+
geom_point() +
scale_shape_manual(values = c(15, 16, 17,
18, 19, 24, 25))
Output:
Now to add a regression line to the plot we could use the stat_smooth() function with the method argument set to "lm" for linear regression:
R
# Add a linear regression line
# with shaded confidence intervals
ggplot(df, aes(x = displ, y = hwy,
color = factor(cyl),
shape = factor(class))) +
geom_point() +
scale_shape_manual(values = c(15, 16, 17,
18, 19, 24, 25)) +
stat_smooth(method = "lm", se = FALSE)
Output:
To further customize the plot, we've changed the color palette using the scale_color_brewer() function with palette = "Set1" to use a more visually appealing color scheme.
R
# Customize colors and shapes of points
ggplot(df, aes(x = displ, y = hwy,
color = factor(cyl),
shape = factor(class))) +
geom_point() +
scale_shape_manual(values = c(15, 16,
17, 18,
19, 24, 25)) +
stat_smooth(method = "lm", se = FALSE)+
scale_color_brewer(palette = "Set1")
Output:
Finally, we can use the labs() function to add custom axis and legend labels:
R
# Add custom axis and legend labels
ggplot(df, aes(x = displ, y = hwy,
color = factor(cyl),
shape = factor(class))) +
geom_point() +
scale_shape_manual(values = c(15, 16, 17,
18, 19, 24, 25)) +
stat_smooth(method = "lm", se = FALSE)+
scale_color_brewer(palette = "Set1") +
labs(x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
color = "Number of cylinders",
shape = "Vehicle class")
In conclusion to this example, we created a correlation scatter plot with engine displacement (displ) on the x-axis, highway fuel economy (hwy) on the y-axis, and color and shape of points mapped to a number of cylinders (cyl) and vehicle class. The plot also includes a linear regression line with shaded confidence intervals and custom labels for the axes and legend. Also, the color and shape of the points are manually specified using the scale_color_manual() and scale_shape_manual() functions, respectively.
Conclusion:
In this article, we demonstrated how to create a correlation scatter plot in R using the ggplot2 library. We've discussed the concepts of scatter plots, correlation, and ggplot2, and provided step-by-step instructions on how to create a scatter plot. Three detailed examples were also provided to showcase the capabilities of ggplot2. The information in the article should be useful for anyone looking to visualize the relationship between two variables using a scatter plot in R.
Similar Reads
Data Visualization using Plotnine and ggplot2 in Python
Plotnoine is a Python library that implements a grammar of graphics similar to ggplot2 in R. It allows users to build plots by defining data, aesthetics, and geometric objects. This approach provides a flexible and consistent method for creating a wide range of visualizations. It is built on the con
7 min read
Data Visualization using GoogleVis Package
GoogleVis is a package in R that is used to act as an interface between R and the Google API to produce interactive charts which can be easily embedded into web pages. This package helps the user to plot the data without uploading them into google. In this article let's parse through some charts tha
5 min read
Data Visualization using ggvis Package in R
The ggvis is an interactive visualization package in R language that is based on the popular ggplot2 package. It allows you to create interactive plots and graphics that can be explored and manipulated by the user. ggvis supports a wide range of plot types including scatter plots, line charts, bar c
15+ min read
Master Data Visualization With ggplot2
In this article, we are going to see the master data visualization with ggplot2 in R Programming Language. Generally, data visualization is the pictorial representation of a dataset in a visual format like charts, plots, etc. These are the important graphs in data visualization with ggplot2, Bar Ch
8 min read
Interactive Data Visualizations in R Using ggiraph
Interactive data visualizations can significantly enhance the ability to explore and understand complex datasets. In R, the ggiraph package allows you to create interactive versions of ggplot2 visualizations. This article will provide an overview of ggiraph, its key features, and step-by-step exampl
5 min read
Visualization of a correlation matrix using ggplot2 in R
In this article, we will discuss how to visualize a correlation matrix using ggplot2 package in R programming language. In order to do this, we will install a package called ggcorrplot package. With the help of this package, we can easily visualize a correlation matrix. We can also compute a matrix
7 min read
How to Save Time with Data Visualization using Stack in R with ggplot2
The widely used R package ggplot2 is used to produce beautiful and efficient data visualisations. Here are some pointers for speeding up data visualisation using the "stack" feature of ggplot2: Select the pertinent information: Make sure the data you plan to use in your visualisation is appropriate.
5 min read
Visualizing Multiple Datasets on the Same Scatter Plot
Seaborn is a powerful Python visualization library built on top of Matplotlib, designed for making statistical graphics easier and more attractive. One common requirement in data visualization is to compare two datasets on the same scatter plot to identify patterns, correlations, or differences. Thi
4 min read
Data visualization with R and ggplot2
The ggplot2 ( Grammar of Graphics ) is a free, open-source visualization package widely used in R Programming Language. It includes several layers on which it is governed. The layers are as follows:Layers with the grammar of graphicsData: The element is the data set itself.Aesthetics: The data is to
7 min read
Dark Mode for Visualisations Using ggdark in R
Being a developer Dark mode is one of the favorite options of the majority. Some consider this good for the eyes and some for cosmetic reasons. So, in this article, we are going to look at such a package in the R Programming Language which enables us to introduce different Dark themes in our visuali
2 min read