How To Create/Customize Your Own Scorer Function In Scikit-Learn?

Last Updated : 28 Apr, 2025

A well-known Python machine learning toolkit called Scikit-learn provides a variety of machine learning tools and methods to assist programmers in creating sophisticated machine learning models. A strong framework for assessing the effectiveness of these models using a variety of metrics and scoring functions is also offered by Scikit-learn. To assess the effectiveness of their models, users might want to design their scoring function in specific circumstances. Scikit-learn makes this possible, and in this article, we'll go over how to design and tweak your very own scoring function.

A scikit-learn function called a scorer accepts two arguments: the ground truth (actual values) and the model's predicted values. A single score that evaluates the accuracy of the anticipated values is returned by the function. Accuracy, precision, recall, F1-score, and other predefined scoring functions are available in Scikit-learn. To assess the effectiveness of their models, users might want to develop their unique scoring system.

Custom scorer for a multi-class Regression problem

To create a custom scorer function in sci-kit-learn, we need to follow some steps:

Step 1: Create a custom function that evaluates the accuracy

create a Python function that accepts two arguments: the model's predicted values and the ground truth (actual values). A single score that evaluates the accuracy of the anticipated values should be returned by the function.

Here I am defining the coefficient of determination (R²)

The coefficient of determination (R²) is a statistical measure that represents how well a statistical model predicts an outcome. It measures the proportion of variance in the predicted output that is explained by the independent input variable(s) in a regression model.

R^2 = 1- \frac{RSS}{TSS}

Here,

RSS = Sum of Squared error also known as Residual sum of squares (RSS) measures the variation that is not explained by the regression model. It is the sum of squared differences between the predicted values and the actual target values.

RSS = \sum(pred-actual)^2

TSS = total sum of squares (TSS) represents the total variation in the dependent variable. It is the sum of squared differences between the actual values and the mean of the dependent variable

TSS = \sum (actual-mean)^2

The value of R² ranges from 0 to 1, with higher values indicating a better fit. A value of 0 indicates that the regression line does not fit the data at all, while a value of 1 indicates a perfect fit.

Python3

import numpy as np

def r_squared(y_true, y_pred):
    # Calculate the mean of the true values
    mean_y_true = np.mean(y_true)

    # Calculate the sum of squares of residuals and total sum of squares
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - mean_y_true) ** 2)

    # Calculate R²
    r2 = 1 - (ss_res / ss_tot)

    return r2

Step 2:Create a scorer object:

Once the scoring function has been constructed, a scorer object must be created using the sci-kit-learn make_scorer() function. The scoring function is passed as an argument to the make_scorer() function, which returns a scorer object.

Python3

from sklearn.metrics import make_scorer
# Create a scorer object using the r_squared function
r2_score = make_scorer(r2_squared)
r2_score

Output:

make_scorer(r2_squared)

Step 3: Implementations of the above-defined scorer object

After creating the scorer object, we can use it to access a machine learning model's performance using the cross-validation functions for different subsets of datasets provided by scikit-learn or other model assessment tools.

Python3

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score

# Load the California Housing Price dataset
X, y = fetch_california_housing(return_X_y=True)

# Create a Random Forest regression model
model = RandomForestRegressor()


# Evaluate the performance of the model u
# sing cross-validation with the r2_squared function
scores = cross_val_score(model,
                         X, y,
                         cv=5, 
                         scoring=r2_score)

# Print the mean and standard deviation of the scores
print(f"R2 Squared: {scores.mean():.2f} +/- {scores.std():.2f}")

Output:

R2 Squared: 0.65 +/- 0.08

Custom scorer for a multi-class classification problem

Steps:

Import the necessary libraries
Load the iris dataset
Define multiple metrics like accuracy_score, precision_score, recall_score, f1_score with make_scorer.
Create a XGBClassifier model
Evaluate the model using cross-validation and the custom scorer
Print the mean scores for each metric