Open In App

Regression Analysis in R Programming

Last Updated : 12 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Regression analysis is a statistical method used to determine the relationship between a dependent variable and one or more independent variables. Regression analysis is commonly used for prediction, forecasting and determining relationships between variables. In R, there are several types of regression techniques, each suitable for different types of data and relationships.

Types of Regression Analysis

We will explore various types of regression in this section.

1. Linear Regression

Linear regression is one of the most common regression techniques used to model the relationship between a dependent variable and one independent variable. The relationship is modeled as:

y = ax+b

Where:

  • y is the dependent variable (response variable)
  • x is the independent variable (predictor)
  • a is the slope (coefficient)
  • b is the intercept

Example: We are going to implement linear regression in R using the lm() function.

R
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)

model <- lm(y ~ x)

summary(model)

Output:

lr
Linear Regression

2. Logistic Regression

Logistic regression is used for classification tasks, where the response variable is categorical (often binary). It estimates the probability of an event occurring using a logistic function:

y = \frac{1}{1 + e^{-z}}

Where:

  • y is the predicted probability(response variable).
  • z is a linear combination of independent variables.

Despite its name, logistic regression is used for classification, not regression tasks, because it predicts a probability (which lies between 0 and 1) rather than a continuous value. However, it is still referred to as logistic regression due to the mathematical form of the model.

Example: We are implementing logistic regression in R using the glm() function with a binomial family.

R
IQ <- rnorm(40, 30, 2)
result <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0)

df <- data.frame(IQ, result)

model <- glm(result ~ IQ, family = binomial, data = df)

summary(model)

Output:

lgr
Logistic Regression

3. Polynomial Regression

Polynomial regression is used when the relationship between the independent and dependent variables is non-linear. It is a form of linear regression where we model the data using polynomial equations. The general equation for polynomial regression of degree n is:

y = a_nx^n + a_{n-1}x^{n-1} + \dots + a_1x + b

Where:

  • y is the dependent variable (response variable)
  • x is the independent variable (predictor)
  • a is the slope (coefficient)
  • b is the intercept
  • n is the degree of the polynomial

Example: We are implementing Polynomial regression in R by adding polynomial terms to the linear regression model.

R
x <- c(1, 2, 3, 4, 5)
y <- c(1, 4, 9, 16, 25)

model <- lm(y ~ poly(x, 2))

summary(model)

Output:

plr
Polynomial Regression

4. Lasso Regression

Lasso regression is a type of linear regression that uses L1 regularization, which helps in feature selection by shrinking some coefficients to zero. This technique is especially useful when there are many features, as it automatically selects the most significant predictors. The model for Lasso regression is represented as:

\text{Lasso (L1):}=\min_{\beta} \left( \text{Loss} + \lambda \|\beta\|_1 \right)

Where:

  • Loss = squared error ( \sum (y_i - \hat{y}_i)^2 )
  • \|\beta\|_1 = \sum |\beta_j|

We are implementing Lasso regression in R using the glmnet package with \alpha =1 to apply L1 regularization.

R
install.packages("glmnet")
library(glmnet)

x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)

model <- glmnet(x, y, alpha = 1)

print(model)

Output:

l1
Lasso Regression

5. Ridge Regression

Ridge regression is another regularized linear regression technique, but instead of L1 regularization (as in Lasso), it applies L2 regularization. This technique reduces the magnitude of the coefficients but does not set them to zero, which helps address multicollinearity in the data. The model for Ridge regression is represented as:

\text{Ridge(L2):} =\min_{\beta} \left( \text{Loss} + \lambda \|\beta\|_2^2 \right)

Where:

  • Loss = squared error \sum (y_i - \hat{y}_i)^2
  • \|\beta\|_2^2 = \sum \beta_j^2

Example: We are implementing Ridge regression in R using the glmnet package with \alpha = 0 to apply L2 regularization.

R
library(glmnet)

x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)

model <- glmnet(x, y, alpha = 0)

print(model)

Output:

l2
Ridge Regression

6. Elastic Net Regression

Elastic Net regression combines both L1 and L2 regularization. It is useful when there are many correlated predictors and helps improve prediction accuracy.

The model for Elastic Net regression is a mix of Lasso and Ridge:

\text{Elastic Net}=\min_{\beta} \left( \text{Loss} + \lambda_1 \|\beta\|_1 + \lambda_2 \|\beta\|_2^2 \right)

Where:

  • Loss = residual sum of squares \sum (y_i - \hat{y}_i)^2
  • \|\beta\|_1 = \sum |\beta_j|
  • \|\beta\|_2^2 = \sum \beta_j^2∥

Example: We are implementing Elastic Net regression in R using the glmnet package with a value for α\alpha between 0 and 1 (for Lasso and Ridge combinations).

R
library(glmnet)

x <- matrix(rnorm(100), ncol=10)
y <- rnorm(10)

model <- glmnet(x, y, alpha = 0.5)

print(model)

Output:

en
Elastic Net Regression

In this article, we have covered multiple regression techniques in R. Each method serves a specific purpose depending on the nature of the data and the problem.


Similar Reads