Joint Feature Selection with multi-task Lasso in Scikit Learn
Last Updated :
28 Apr, 2025
This article is likely to introduces the concepts of Lasso and multitask Lasso regression and demonstrates how to implement these methods in Python using the scikit-learn library.
The article is going to cover the differences between Lasso and multitask Lasso and provide guidance on which method to prefer in different situations. It may also provide examples of how to implement Lasso and multitask Lasso in Python using the scikit-learn library, and Finally how to perform joint feature selection with multitask Lasso.
What is Joint feature selection?
Joint feature selection is a method for selecting a subset of features to use as input to a machine learning model. The goal of joint feature selection is to select a set of features that are relevant to the prediction task and that work well together to improve the performance of the model.
Joint feature selection can be an important step in the machine learning process, as it can help to improve the performance of the model by selecting the most relevant and informative features from the dataset. It can also help to reduce the complexity of the model and make it more interpretable by eliminating unnecessary or redundant features.
What is Lasso?
Lasso is a type of linear regression that uses L1 regularization, which is a method for reducing overfitting and improving the generalization of the model by adding a penalty term to the objective function. L1 regularization involves adding a term to the objective function that is proportional to the absolute value of the coefficients of the features.
Lasso regression is particularly useful for feature selection, as the L1 regularization term encourages the coefficients of the less important features to be reduced to zero, effectively eliminating those features from the model. This results in sparse solutions, meaning that the final model will only include the most important features. Lasso regression can be used to improve the performance of a linear regression model by reducing overfitting and increasing the generalization of the model.
In general, Lasso regression is a type of shrinkage method that can be used to improve the prediction performance of a linear regression model. It is particularly well-suited for situations where the number of features is larger than the number of observations, or when you want to select a subset of the most important features from a larger dataset.
What is Multi-task Lasso?
Multitask Lasso is a variant of Lasso regression that is designed to handle multiple tasks simultaneously. In multitask Lasso regression, the model is trained to predict multiple target variables at once, rather than just one. The regularization term in multitask Lasso regression is shared across all tasks, which means that the model can learn relationships between the tasks and use that information to improve the prediction performance.
Multitask Lasso regression is often used in multi-output or multi-task learning scenarios, where the goal is to predict multiple related target variables at once. It can be useful for situations where there are multiple tasks that are related and where it is possible to learn relationships between the tasks that can improve the prediction performance.
Like Lasso regression, multitask Lasso regression uses L1 regularization to reduce overfitting and improve the generalization of the model. However, in multitask Lasso regression, the regularization term is shared across all tasks, rather than being applied individually to each task as in Lasso regression. This allows the model to learn relationships between the tasks and use that information to improve the prediction performance.
Difference between the Lasso and the multi-task Lasso using toy datasets from scikit-learn
To use the MultiTaskLasso model with the load_diabetes dataset from sklearn.datasets and perform feature selection using both recursive feature elimination (RFE) and Lasso regularization, the following steps are taken in the code:
Python3
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.feature_selection import RFE
from sklearn.linear_model import MultiTaskLasso, Lasso
from sklearn.model_selection import train_test_split
Load the load_diabetes dataset and split it into training and test sets.
Python3
# Load the diabetes dataset
X, y = load_diabetes(return_X_y=True)
# Split the dataset into training and test sets
X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
Create output variables and create a MultiTaskLasso model with an alpha value of 0.1.
Python3
y_train = y_train[:, np.newaxis]
y_test = y_test[:, np.newaxis]
# Create a multi-task Lasso
# model with an alpha value of 0.1
model = MultiTaskLasso(alpha=0.1)
The MultiTaskLasso model is created with an alpha value of 0.1, which is a hyperparameter that controls the strength of the Lasso regularization term in the optimization objective. A smaller alpha value means that the model is more likely to select more features, while a larger alpha value means that the model is more likely to select fewer features.
Create an RFE object with the model and the desired number of features to select and fit the RFE object to the training data.
Python3
# Create an RFE object with the multi-task
# Lasso model and the desired number of
# features to select
rfe = RFE(model, n_features_to_select=3)
# Fit the RFE object to the training data
rfe.fit(X_train, y_train)
The RFE object is fit to the training data using the fit method. This will train the MultiTaskLasso model on the training data and perform recursive feature elimination to select the 3 most important features.
Use the model to make predictions on the test set and get the indices of the selected features using RFE. After this we will fit the Lasso model to the training data.
Python3
# Use the multi-task Lasso model
# to make predictions on the test set
y_pred = rfe.predict(X_test)
# Get the indices of the selected features using RFE
selected_features_MultiTaskLasso = \
rfe.get_support(indices=True)
# Fit the Lasso model to the training data
model = Lasso(alpha=0.1)
model.fit(X_train, y_train)
A Lasso model is created with an alpha value of 0.1 and fits the training data using the fit method. This will train the Lasso model on the training data and perform feature selection using Lasso regularization.
Get the indices of the selected features using Lasso regularization and print them.
Python3
# Get the indices of the selected
# features using Lasso regularization
selected_features_lasso = np.flatnonzero(model.coef_)
print("Selected features using MultiTaskLasso:",
selected_features_MultiTaskLasso)
print("Selected features using Lasso:",
selected_features_lasso)
Output:
Selected features using MultiTaskLasso: [2 3 8]
Selected features using Lasso: [1 2 3 4 6 8 9]
To compare the mean squared error (MSE) of the MultiTaskLasso and Lasso models on the test set.
Python3
# Calculate the MSE of the MultiTaskLasso model
mse_MultiTaskLasso = np.mean((y_test - y_pred)**2)
# Calculate the MSE of the Lasso model
y_pred_lasso = model.predict(X_test)
mse_lasso = np.mean((y_test - y_pred_lasso)**2)
# Print the MSE of the two models
print("MSE of MultiTaskLasso:", mse_MultiTaskLasso)
print("MSE of Lasso:", mse_lasso)
Output:
MSE of MultiTaskLasso: 2880.345311787223
MSE of Lasso: 7894.624137792652
The MultiTaskLasso model's predictions on the test set (y_pred) are used to calculate its MSE using the np.mean function and the squared error between the predictions and the true values (y_test). The Lasso model's predictions on the test set (y_pred_lasso) are similarly used to calculate its MSE. Finally, the MSE of the two models is printed using the print function.
To visualize the MSE of the MultiTaskLasso and Lasso models on the test set in a graph, you can use the Matplotlib library:
Python3
import matplotlib.pyplot as plt
# Create a bar plot comparing the MSE of the two models
plt.bar(["MultiTaskLasso", "Lasso"],
[mse_MultiTaskLasso, mse_lasso])
plt.ylabel("MSE")
plt.show()
Output:
Â
The bar function from matplotlib.pyplot is used to create a bar plot with the names of the two models on the x-axis and their MSE on the y-axis. The ylabel function is used to add a label to the y-axis, and the show function is used to display the plot.
This will create a simple bar plot comparing the MSE of the MultiTaskLasso and Lasso models. You can customize the appearance of the plot by using additional functions from matplotlib.pyplot, such as title to add a title to the plot, xlabel to add a label to the x-axis, or ylim to set the limits of the y-axis.
Similar Reads
Feature Selection in Python with Scikit-Learn
Feature selection is a crucial step in the machine learning pipeline. It involves selecting the most important features from your dataset to improve model performance and reduce computational cost. In this article, we will explore various techniques for feature selection in Python using the Scikit-L
4 min read
SVM with Univariate Feature Selection in Scikit Learn
Support Vector Machines (SVM) is a powerful machine learning algorithm used for classification and regression analysis. It is based on the idea of finding the optimal boundary between two classes that maximizes the margin between them. However, the challenge with SVM is that it requires a large amou
10 min read
Feature selection using SelectFromModel and LassoCV in Scikit Learn
Feature selection is a critical step in machine learning and data analysis, aimed at identifying and retaining the most relevant variables in a dataset. It not only enhances model performance but also reduces overfitting and improves interpretability. In this guide, we delve into the world of featur
7 min read
What is Scikit-learn Random State in Splitting Dataset?
One of the key aspects for developing reliable models is the concept of the random_state parameter in Scikit-learn, particularly when splitting datasets. This article delves into the significance of random_state, its usage, and its impact on model performance and evaluation. Table of Content Underst
5 min read
Feature Agglomeration vs Univariate Selection in Scikit Learn
Selecting the most relevant characteristics for a given job is the aim of feature selection, a crucial stage in machine learning. Feature Agglomeration and Univariate Selection are two popular methods for feature selection in Scikit-Learn. These techniques aid in the reduction of dimensionality, inc
4 min read
Feature Transformations with Ensembles of Trees in Scikit Learn
An ensemble of trees is an efficient technique that can be used to combine multiple weak learners into a strong learner. The main idea of the ensemble for trees is that we take aggregate of the results from multiple trees which may not have been able to perform well. This aggregate mitigates the wea
10 min read
Feature Selection Techniques in Machine Learning
In data science many times we encounter vast of features present in a dataset. But it is not necessary all features contribute equally in prediction that's where feature selection comes. It involves selecting a subset of relevant features from the original feature set to reduce the feature space whi
5 min read
Performing Feature Selection with gridsearchcv in Sklearn
Feature selection is a crucial step in machine learning, as it helps to identify the most relevant features in a dataset that contribute to the model's performance. One effective way to perform feature selection is by combining it with hyperparameter tuning using GridSearchCV from scikit-learn. In t
4 min read
Model Selection with Probabilistic PCA and Factor Analysis (FA) in Scikit Learn
In the field of machine learning, model selection plays a vital role in finding the most suitable algorithm for a given dataset. When dealing with dimensionality reduction tasks, methods such as Principal Component Analysis (PCA) and Factor Analysis (FA) are commonly employed. However, in scenarios
10 min read
Recursive Feature Elimination with Cross-Validation in Scikit Learn
In this article, we will earn how to implement recursive feature elimination with cross-validation using scikit learn package in Python. What is Recursive Feature Elimination (RFE)? Recursive Feature Elimination (RFE) is a feature selection algorithm that is used to select a subset of the most relev
5 min read