Understanding Feature Importance in Logistic Regression Models

Last Updated : 18 Jul, 2024

Logistic regression is a fundamental classification algorithm in machine learning and statistics. It is widely used for binary classification tasks and can be extended to multiclass problems. Understanding which features influence the predictions of a logistic regression model is crucial for interpretability and model improvement. To analyze a logistic regression model and enhance its performance, one must comprehend the significance of each variable.

This article delves into various methods to determine feature importance in logistic regression, providing a comprehensive guide for data scientists and machine learning practitioners.

Table of Content

Overview of Logistic Regression
Feature Importance Techniques for Logistic Models

1. Coefficient Magnitude
2. Odds Ratios
3. Recursive Feature Elimination (RFE)
4. L1 Regularization (Lasso)
5. Cross-Validation
6. Permutation Importance

Feature Importance in Logistic Regression with Scikit-Learn
Comparison of Methods : When To Use
Handling Multicollinearity
Applications of Understanding Feature Importance

Overview of Logistic Regression

A statistical technique called logistic regression is applied to binary classification issues in which there are two possible outcomes for the categorical outcome variable (e.g., yes/no, true/false, 0/1). Logistic regression predicts the likelihood that a given input belongs to a specific class, as opposed to linear regression, which predicts continuous values.

The logistic regression model converts the linear combination of input features into a probability value between 0 and 1 by using the logistic (or sigmoid) function.

Next, we will delve into the methods used to determine the importance of features in a logistic regression model.

Feature Importance Techniques for Logistic Models

1. Coefficient Magnitude

The easiest way to determine the significance of a feature in logistic regression is to look at the size of the coefficients (β). Features with higher absolute coefficient values are deemed more significant. Each coefficient represents the change in the log odds of the outcome for a one-unit change in the predictor variable, holding all other variables constant.

Positive Coefficient: Indicates that an increase in the predictor variable increases the log odds of the positive class.
Negative Coefficient: Indicates that an increase in the predictor variable decreases the log odds of the positive class.

For standardized features, the magnitude of the coefficients can be directly compared to assess the relative importance of each feature.

2. Odds Ratios

An additional method of interpreting the coefficients is through odds ratios. One way to get the odds ratio for a feature is to exponentiate the coefficient:

Odds Ratio > 1: The feature increases the odds of the outcome.
Odds Ratio < 1: The feature decreases the odds of the outcome.
Odds Ratio = 1: The feature does not affect the odds of the outcome.

3. Recursive Feature Elimination (RFE)

Until the required number of features is reached, the least significant feature (or features) are removed iteratively using the RFE method of fitting the model.

After fitting the model, rank the features.
Eliminate the feature(s) that are least important.
Continue until the required quantity of characteristics is maintained.

4. L1 Regularization (Lasso)

By reducing some coefficients to zero, L1 regularization increases sparsity by adding a penalty proportional to the absolute value of the magnitude of the coefficients.

Utilizing L1 regularization, fit the logistic regression model.
Features that have coefficients that are not zero are deemed significant.

5. Cross-Validation

When evaluating the model's stability and performance with various feature subsets, cross-validation can be helpful. Characteristics that reliably lead to strong performance are considered significant.

6. Permutation Importance

The process of permutation importance entails rearranging feature values at random and calculating the reduction in model performance. A more significant feature is indicated by a greater reduction.

After fitting the model, note the performance baseline.
Reassess the model after rearranging the values of a single feature.
Compute the performance decline.
Continue for every feature, then order them according to the decline in performance.

Feature Importance in Logistic Regression with Scikit-Learn

Here is a Python code example using scikit-learn to demonstrate how to assess feature importance in a logistic regression model. This example includes coefficient magnitudes, odds ratios, and permutation importance.

Step 1: Import Libraries

Python

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.inspection import permutation_importance
from sklearn.feature_selection import RFE

Step 2: Load and Prepare Dataset

Python

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

Step 3: Split Dataset into Training and Test Sets

Python

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 4: Create and Fit Logistic Regression Model

Python

# Create and fit logistic regression model
model = LogisticRegression(max_iter=10000, solver='liblinear')
model.fit(X_train, y_train)

Step 5: Calculate Model Accuracy

Python

# Calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

Output:

Model Accuracy: 0.96

Step 6: Compute Coefficients and Odds Ratios

Python

# Coefficients and Odds Ratios
coefficients = model.coef_[0]
odds_ratios = np.exp(coefficients)


# Display feature importance using coefficients and odds ratios
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': coefficients,
    'Odds Ratio': odds_ratios
})
print("\nFeature Importance (Coefficient and Odds Ratio):")
print(feature_importance.sort_values(by='Coefficient', ascending=False))

Output:

Feature Importance (Coefficient and Odds Ratio):
                    Feature  Coefficient  Odds Ratio
0               mean radius     2.175329    8.805078
11            texture error     1.403643    4.070000
20             worst radius     1.155193    3.174638
1              mean texture     0.159658    1.173109
12          perimeter error     0.117866    1.125094
19  fractal dimension error    -0.000769    0.999231
3                 mean area    -0.004002    0.996006
14         smoothness error    -0.014646    0.985461
23               worst area    -0.021324    0.978902
15        compactness error    -0.024838    0.975468
9    mean fractal dimension    -0.029289    0.971135
17     concave points error    -0.041148    0.959687
18           symmetry error    -0.048783    0.952388
16          concavity error    -0.063487    0.938487
10             radius error    -0.066118    0.936020
22          worst perimeter    -0.076792    0.926082
13               area error    -0.109265    0.896493
29  worst fractal dimension    -0.110785    0.895131
2            mean perimeter    -0.125372    0.882168
4           mean smoothness    -0.130413    0.877733
8             mean symmetry    -0.202222    0.816914
24         worst smoothness    -0.242144    0.784943
7       mean concave points    -0.350106    0.704613
21            worst texture    -0.390328    0.676835
5          mean compactness    -0.411271    0.662807
27     worst concave points    -0.617351    0.539371
6            mean concavity    -0.655026    0.519429
28           worst symmetry    -0.729143    0.482322
25        worst compactness    -1.139760    0.319896
26          worst concavity    -1.579345    0.206110

Step 7: Compute Permutation Importance

Python

# Permutation Importance
perm_importance = permutation_importance(model, X_test, y_test, n_repeats=30, random_state=42, n_jobs=-1)
perm_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance Mean': perm_importance.importances_mean,
    'Importance Std': perm_importance.importances_std
})
print("\nPermutation Importance:")
print(perm_importance_df.sort_values(by='Importance Mean', ascending=False))

Output:

Permutation Importance:
                    Feature  Importance Mean  Importance Std
23               worst area         0.475244        0.037474
2            mean perimeter         0.147173        0.023298
13               area error         0.119493        0.022431
22          worst perimeter         0.111696        0.020051
0               mean radius         0.098441        0.023102
21            worst texture         0.082066        0.018523
20             worst radius         0.053216        0.018204
3                 mean area         0.024172        0.013651
1              mean texture         0.003509        0.008075
11            texture error         0.001559        0.006213
17     concave points error         0.000000        0.000000
28           worst symmetry         0.000000        0.000000
27     worst concave points         0.000000        0.000000
24         worst smoothness         0.000000        0.000000
19  fractal dimension error         0.000000        0.000000
18           symmetry error         0.000000        0.000000
15        compactness error         0.000000        0.000000
16          concavity error         0.000000        0.000000
14         smoothness error         0.000000        0.000000
10             radius error         0.000000        0.000000
9    mean fractal dimension         0.000000        0.000000
8             mean symmetry         0.000000        0.000000
7       mean concave points         0.000000        0.000000
5          mean compactness         0.000000        0.000000
4           mean smoothness         0.000000        0.000000
29  worst fractal dimension         0.000000        0.000000
26          worst concavity        -0.000195        0.003197
6            mean concavity        -0.000195        0.001050
25        worst compactness        -0.000975        0.002179
12          perimeter error        -0.001949        0.003143

Step 8: Perform Recursive Feature Elimination (RFE)

Python

# Recursive Feature Elimination (RFE)
rfe_model = LogisticRegression(max_iter=10000, solver='liblinear')
rfe = RFE(rfe_model, n_features_to_select=5)
rfe.fit(X_train, y_train)


rfe_features = X.columns[rfe.support_]
print("\nSelected Features by RFE:")
print(rfe_features)

Output:

Selected Features by RFE:
Index(['mean radius', 'mean concavity', 'worst radius', 'worst concavity',
       'worst concave points'],
      dtype='object')

Full Implementation Code:

Python

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.inspection import permutation_importance

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and fit logistic regression model
model = LogisticRegression(max_iter=10000, solver='liblinear')
model.fit(X_train, y_train)

# Calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Coefficients and Odds Ratios
coefficients = model.coef_[0]
odds_ratios = np.exp(coefficients)

# Display feature importance using coefficients and odds ratios
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': coefficients,
    'Odds Ratio': odds_ratios
})
print("\nFeature Importance (Coefficient and Odds Ratio):")
print(feature_importance.sort_values(by='Coefficient', ascending=False))

# Permutation Importance
perm_importance = permutation_importance(model, X_test, y_test, n_repeats=30, random_state=42, n_jobs=-1)
perm_importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance Mean': perm_importance.importances_mean,
    'Importance Std': perm_importance.importances_std
})
print("\nPermutation Importance:")
print(perm_importance_df.sort_values(by='Importance Mean', ascending=False))

# Recursive Feature Elimination (RFE)
from sklearn.feature_selection import RFE

rfe_model = LogisticRegression(max_iter=10000, solver='liblinear')
rfe = RFE(rfe_model, n_features_to_select=5)
rfe.fit(X_train, y_train)

rfe_features = X.columns[rfe.support_]
print("\nSelected Features by RFE:")
print(rfe_features)

Comparison of Methods : When To Use

Technique	Description	Interpretation	When to Use
Coefficient Magnitude	Evaluate feature significance by coefficient magnitude	Positive/Negative Coefficient	Quick initial evaluation
Odds Ratios	Interpret coefficients through odds ratios	Odds Ratio >/< 1	Interpretable measure of feature importance
Recursive Feature Elimination (RFE)	Iteratively remove least significant features	Rank features, eliminate least important	Subset of most important features
L1 Regularization (Lasso)	Add penalty to increase sparsity	Features with non-zero coefficients	High-dimensional datasets, feature selection
Cross-Validation	Evaluate model stability and performance with different feature subsets	Identify consistently strong features	Model stability and performance evaluation
Permutation Importance	Measure performance decline by permuting feature values	Rank features by performance decline	Detailed understanding of feature contributions

Handling Multicollinearity

Multicollinearity occurs when predictor variables are highly correlated, which can inflate the variance of the coefficient estimates and make the model unstable. To handle multicollinearity, consider the following approaches:

Remove Highly Correlated Features: Identify and remove one of the correlated features.
Combine Features: Create a new feature that combines the information from the correlated features.
Regularization: Use techniques like L1 (Lasso) or L2 (Ridge) regularization to penalize large coefficients.

Applications of Understanding Feature Importance

Understanding feature importance is crucial in various applications, such as:

Medical Diagnosis: Identifying the most important features contributing to the diagnosis of diseases can help in developing more accurate and efficient diagnostic tools.
Marketing: Determining the most important features influencing customer behavior can aid in creating targeted marketing campaigns.
Financial Analysis: Evaluating the importance of features in predicting stock prices or credit risk can improve investment decisions.

Conclusion

Determining feature importance in logistic regression is essential for model interpretability and improvement. Various methods, including coefficient magnitude, standardized coefficients, permutation importance, RFE, and SelectFromModel, provide insights into which features are most influential. Handling multicollinearity and correctly interpreting the coefficients are crucial steps in this process. By understanding and applying these techniques, data scientists can build more transparent and effective logistic regression models.

Understanding Feature Importance in Logistic Regression Models

daswanta_kumar_routhu

Improve

Article Tags :

Practice Tags :

Machine Learning

Understanding Feature Importance in Logistic Regression Models

Overview of Logistic Regression

Feature Importance Techniques for Logistic Models

1. Coefficient Magnitude

2. Odds Ratios

3. Recursive Feature Elimination (RFE)

4. L1 Regularization (Lasso)

5. Cross-Validation

6. Permutation Importance

Feature Importance in Logistic Regression with Scikit-Learn

Step 1: Import Libraries

Step 2: Load and Prepare Dataset

Step 3: Split Dataset into Training and Test Sets

Step 4: Create and Fit Logistic Regression Model

Step 5: Calculate Model Accuracy

Step 6: Compute Coefficients and Odds Ratios

Step 7: Compute Permutation Importance

Step 8: Perform Recursive Feature Elimination (RFE)

Full Implementation Code:

Comparison of Methods : When To Use

Handling Multicollinearity

Applications of Understanding Feature Importance

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?