Drawing Only Boundaries of stat_smooth in ggplot2 using R
Last Updated :
23 Sep, 2024
When creating plots with ggplot2
, you often use stat_smooth()
to add a smooth curve to visualize trends in your data. By default, stat_smooth()
includes both the smoothed line and the shaded confidence interval. However, in certain cases, you may only want to show the boundaries of the confidence interval without filling the area between them. In this article, we will cover how to draw only the boundaries of the confidence interval (without shading) using stat_smooth()
in ggplot2
.
Overview of stat_smooth()
The stat_smooth()
function in ggplot2
is used to add a smoothed conditional mean to a plot. It often includes:
- A line representing the estimated trend.
- A shaded region represents the confidence interval around the smooth line.
The goal here is to display only the boundaries of the confidence interval and remove the shaded region.
Method 1: Default Behavior of stat_smooth()
Let’s first see how stat_smooth()
behaves by default, with both the trend line and the shaded confidence interval. We will use the mtcars
dataset as an example.
R
# Load the required libraries
library(ggplot2)
# Create a basic scatter plot with stat_smooth()
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
stat_smooth(method = "lm", se = TRUE) +
labs(title = "Default stat_smooth() with Confidence Interval")
Output:
Default Behavior of stat_smooth()geom_point()
: Creates a scatter plot.stat_smooth()
: Adds a smoothed line (method = "lm"
specifies linear regression) and the shaded confidence interval (since se = TRUE
).
By default, the plot shows a regression line with a shaded confidence interval.
Method 2: Drawing Only the Boundaries of the Confidence Interval
To show only the boundaries of the confidence interval without filling the region, we can use the following steps:
- Turn off the shading by setting
se = FALSE
in stat_smooth()
. - Manually add the confidence interval boundaries using
geom_ribbon()
or geom_line()
.
Step 1: Remove the Shading with se = FALSE
By setting the se = FALSE
argument in stat_smooth()
, you can remove the shaded region:
R
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
labs(title = "Smoothed Line without Confidence Interval Shading")
Output:
Drawing Only Boundaries of stat_smooth in ggplot2 using RThis removes the confidence interval entirely, but the goal is to add just the boundaries back.
Step 2: Add Boundaries Using geom_line()
and predict()
To manually add the boundaries of the confidence interval, you can use the predict()
function to calculate the upper and lower bounds and then plot them using geom_line()
.
R
# Fit a linear model
fit <- lm(mpg ~ wt, data = mtcars)
# Create a data frame with fitted values and confidence intervals
pred_data <- data.frame(
wt = mtcars$wt,
mpg = predict(fit, newdata = mtcars, interval = "confidence")
)
# Create the plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_line(aes(y = mpg.fit), data = pred_data, color = "blue", size = 1) + # Smoothed line
geom_line(aes(y = mpg.lwr), data = pred_data, linetype = "dashed", color = "red") + # Lower boundary
geom_line(aes(y = mpg.upr), data = pred_data, linetype = "dashed", color = "red") + # Upper boundary
labs(title = "Linear Regression with Confidence Interval Boundaries",
x = "Weight",
y = "Miles per Gallon") +
theme_minimal()
Output:
Drawing Only Boundaries of stat_smooth in ggplot2 using Rpredict()
is used to calculate the fitted values and the confidence intervals.geom_line()
is used to add both the smoothed line and the upper/lower boundaries of the confidence intervals as dashed lines.- The smoothed line is blue, and the boundaries are represented by dashed red lines.
This plot shows the smoothed regression line with the upper and lower confidence interval boundaries, but without shading between the two boundaries.
Method 3: Using geom_smooth()
for Built-In Confidence Interval Boundaries
If you want a simpler approach using ggplot2
, you can use geom_smooth()
with the fullrange = TRUE
argument and adjust the alpha
of the fill to 0, making the shading invisible while still keeping the boundaries.
R
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE, fill = NA, linetype = "dashed", color = "red") +
labs(title = "Confidence Interval Boundaries Without Shading",
x = "Weight",
y = "Miles per Gallon") +
theme_minimal()
Output:
Drawing Only Boundaries of stat_smooth in ggplot2 using Rfill = NA
: Removes the shading of the confidence interval.linetype = "dashed"
: Changes the confidence interval lines to dashed lines.color = "red"
: Changes the color of the confidence interval boundaries.
This plot displays only the dashed boundaries of the confidence interval without shading.
Conclusion
In this guide, we learned how to draw only the boundaries of the confidence interval without shading using stat_smooth()
and geom_line()
in ggplot2
. Whether you prefer to manually calculate the confidence intervals and plot them or use a simpler built-in method, R provides flexible options for customizing your plots.
Similar Reads
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML) A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read
Backpropagation in Neural Network Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Polymorphism in Java Polymorphism in Java is one of the core concepts in object-oriented programming (OOP) that allows objects to behave differently based on their specific class type. The word polymorphism means having many forms, and it comes from the Greek words poly (many) and morph (forms), this means one entity ca
7 min read
3-Phase Inverter An inverter is a fundamental electrical device designed primarily for the conversion of direct current into alternating current . This versatile device , also known as a variable frequency drive , plays a vital role in a wide range of applications , including variable frequency drives and high power
13 min read
What is Vacuum Circuit Breaker? A vacuum circuit breaker is a type of breaker that utilizes a vacuum as the medium to extinguish electrical arcs. Within this circuit breaker, there is a vacuum interrupter that houses the stationary and mobile contacts in a permanently sealed enclosure. When the contacts are separated in a high vac
13 min read
AVL Tree Data Structure An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference between heights of left and right subtrees for any node cannot be more than one. The absolute difference between the heights of the left subtree and the right subtree for any node is known as the balance factor of
4 min read
CTE in SQL In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
6 min read