How to Get Regression Model Summary from Scikit-Learn
Last Updated :
27 Jun, 2022
In this article, we are going to see how to get a regression model summary from sci-kit learn.
It can be done in these ways:
- Scikit-learn Packages
- Stats model package
You may want to extract a summary of a regression model created in Python with Scikit-learn. Scikit-learn does not have many built-in functions for analyzing the summary of a regression model because it is generally used for prediction. Scikit learn has different attributes and methods to get the model summary.
We imported the necessary packages. Then the iris dataset is loaded from sklearn.datasets. And feature and target arrays are created then test and train sets are created using the train_test_split() method and the simple linear regression model is created then train data is fitted into the model, and predictions are carried out on the test set using .predict() method.
Python3
# Import packages
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Load the data
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
# predicting on the X_test data set
print(model.predict(X_test))
# summary of the model
print('model intercept :', model.intercept_)
print('model coefficients : ', model.coef_)
print('Model score : ', model.score(X, y))
Output:
[ 1.23071715 -0.04010441 2.21970287 1.34966889 1.28429336 0.02248402
1.05726124 1.82403704 1.36824643 1.06766437 1.70031437 -0.07357413
-0.15562919 -0.06569402 -0.02128628 1.39659966 2.00022876 1.04812731
1.28102792 1.97283506 0.03184612 1.59830192 0.09450931 1.91807547
1.83296682 1.87877315 1.78781234 2.03362373 0.03594506 0.02619043]
model intercept : 0.2525275898181484
model coefficients : [-0.11633479 -0.05977785 0.25491375 0.54759598]
Model score : 0.9299538012397455
Example 2: Using the summary() method of Stats model package
In this method, we use the statsmodels. formula.api package. If you want to extract a summary of a regression model in Python, you should use the statsmodels package. The code below demonstrates how to use this package to fit the same multiple linear regression model as in the earlier example and obtain the model summary.
To access and download the CSV file click here.
Python3
# import packages
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
# loading the csv file
df = pd.read_csv('headbrain1.csv')
print(df.head())
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight',
data=df).fit()
# model summary
print(model.summary())
Output:
Description of some of the terms in the table :
- R-squared value: The R-squared value ranges from 0 to 1. An R-squared of 100% indicates that changes in the independent variable completely explain all changes in the dependent variable (s). If the r-squared value is 1, it indicates a perfect fit. The r-squared value in our example is 0.638.
- F-statistic: The F-statistic compares the combined effect of all variables. Simply put, if your alpha level is greater than your p-value, you should reject the null hypothesis.
- coef: the coefficients of the regression equation's independent variables.
Our predictions:
If we use 0.05 as our significance level, we reject the null hypothesis and accept the alternative hypothesis as p< 0.05. As a result, we can conclude that there is a relation between head size and brain weight.
Similar Reads
Multiple Linear Regression With scikit-learn In this article, let's learn about multiple linear regression using scikit-learn in the Python programming language. Regression is a statistical method for determining the relationship between features and an outcome variable or result. Machine learning, it's utilized as a method for predictive mode
11 min read
How to Print the Model Summary in PyTorch Printing a model summary is a crucial step in understanding the architecture of a neural network. In frameworks like Keras, this is straightforward with the model.summary() method. However, in PyTorch, achieving a similar output requires a bit more work. This article will guide you through the proce
6 min read
How to Extract the Intercept from a Linear Regression Model in R Linear regression is a method of predictive analysis in machine learning. It is basically used to check two things: If a set of predictor variables (independent) does a good job predicting the outcome variable (dependent).Which of the predictor variables are significant in terms of predicting the ou
4 min read
Remove Intercept from Regression Model in R In this article, we will discuss how to remove intercept from the Regression model in the R Programming Language. Extract intercept from the linear regression model To extract intercept from the linear regression model in the R Language, we use the summary() function of the R Language. We first crea
4 min read
How to Plot the Linear Regression in R In this article, we are going to learn to plot linear regression in R. But, to plot Linear regression, we first need to understand what exactly is linear regression. What is Linear Regression?Linear Regression is a supervised learning model, which computes and predicts the output implemented from th
8 min read
How to Generate Feature Importance Plots from Scikit-Learn? Understanding which factors affect predictions in machine learning models is vital for making them more accurate and reliable. Feature importance plots are tools that help us see and rank these factors visually, which makes it simpler to understand and improve our models. Here we create these plots
3 min read
Ordinary Least Squares and Ridge Regression Variance in Scikit Learn In statistical modeling, Ordinary Least Squares (OLS) and Ridge Regression are two widely used techniques for linear regression analysis. OLS is a traditional method that finds the line of best fit through the data by minimizing the sum of the squared errors between the predicted and actual values.
7 min read
How to Obtain TP, TN, FP, FN with Scikit-Learn Answer: To obtain True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) for evaluating classification models, Scikit-Learn offers a straightforward method using the confusion_matrix function. This function helps in extracting these metrics directly from your model'
2 min read
Stochastic Gradient Descent Regressor using Scikit-learn Stochastic Gradient Descent (SGD) is a popular optimization technique in the field of machine learning. It is particularly well-suited for handling large datasets and online learning scenarios where data arrives sequentially. In this article, we will discuss how a stochastic gradient descent regress
3 min read
What is fit() method in Python's Scikit-Learn? Scikit-Learn, a powerful and versatile Python library, is extensively used for machine learning tasks. It provides simple and efficient tools for data mining and data analysis. Among its many features, the fit() method stands out as a fundamental component for training machine learning models. This
4 min read