Methods to Minimize False Negatives and False Positives in Binary Classification

Last Updated : 23 Jul, 2025

When we build a Machine Learning model, different scenarios arise like overfitting, underfitting, dip in Recall and Precision values etc. Now when there is a dip in Precision value, we can say with certainty that there has been increase in False Positives and when there is a dip in Recall value, then there is increase in False Negatives.

False Positives: occur when the value is actually negative but our model predicts is as positive. For example this issue occurs when we use binary models to predict whether a person is criminal or innocent.
False Negatives: occur when the value is actually positive but our model predicts it as negative. For example this issue occurs when we use binary models to predict whether a person is suffering from any disease or not.

First we have the cancer dataset that has two classes: benign and malign. Now we will go through some methods to minimize the False Negatives and False Positives in Binary Classification. This article will explore several strategies to minimize false negatives and false positives in binary classification. These include optimizing the decision threshold, handling imbalanced datasets, choosing appropriate metrics, regularizing the model, and others.

Table of Content

Methods to Minimize False Negatives
Methods to Minimize False Positives
Balancing False Negatives and False Positives

Methods to Minimize False Negatives

Adjusting the Decision Threshold: One of the simplest methods to reduce false negatives is by adjusting the decision threshold of the classifier. By default, many classifiers use a threshold of 0.5 for binary decisions. Lowering this threshold can help capture more positive instances, thus reducing false negatives.
Cost-sensitive Learning: Implementing cost-sensitive learning allows the model to assign different costs to false negatives and false positives. By emphasizing the cost of false negatives, the model can be trained to minimize these errors more effectively.
Data Augmentation: Increasing the diversity and quantity of training data through data augmentation techniques can help improve model generalization and reduce false negatives. This involves creating synthetic data points or transforming existing data to enhance model learning.
Ensemble Methods: Using ensemble methods like bagging or boosting can improve model performance by combining multiple models' predictions. Techniques such as Random Forests or Gradient Boosting Machines often yield better accuracy and lower false negative rates.
Feature Engineering: Carefully selecting and engineering features that are highly indicative of the positive class can help in reducing false negatives. This involves domain knowledge and exploratory data analysis to identify key features.

Methods to Minimize False Positives

Precision-Recall Trade-off: Focusing on optimizing precision rather than accuracy can help reduce false positives. Precision measures the proportion of true positive predictions among all positive predictions, thus prioritizing correct identification over mere prediction frequency.
Regularization Techniques: Applying regularization techniques like L1 or L2 regularization can prevent overfitting, which often leads to high false positive rates. Regularization helps in simplifying models by penalizing complex ones that might fit noise in the data.
Cross-validation: Implementing cross-validation techniques ensures that the model's performance is consistent across different subsets of data, reducing overfitting and consequently minimizing false positives.
Anomaly Detection Techniques: In cases where the positive class is rare (e.g., fraud detection), anomaly detection algorithms can be employed to identify outliers as potential positives, thereby reducing false positives by focusing on unusual patterns.
Model Calibration: Calibrating models using techniques like Platt scaling or isotonic regression can adjust predicted probabilities closer to true likelihoods, thus refining decision boundaries and reducing false positives.

1. Adjusting the Decision Threshold

Decision Threshold means that after the calculation if the probabilistic prediction is greater than 0.5 we assign the class 1 else we assign class 0 to that datapoint. Now adjusting the decision threshold can influence False Positive or False Negatives.

If we lower the value of value of the threshold, the recall value increases and if we increase the threshold value Precision increases meaning False Positives is decreasing.

Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Labels (0 = malignant, 1 = benign)

# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model
model = LogisticRegression(max_iter=1000)

# Train the model
model.fit(X_train, y_train)

# Predict probabilities for the test data
y_prob = model.predict_proba(X_test)[:, 1]  # Probabilities for the positive class (benign)

# Adjust the decision threshold (e.g., 0.4 instead of the default 0.5)
threshold = 0.1534
y_pred_threshold = (y_prob >= threshold).astype(int)

# Evaluate the model with the adjusted threshold
accuracy = accuracy_score(y_test, y_pred_threshold)
cm = confusion_matrix(y_test, y_pred_threshold)
report = classification_report(y_test, y_pred_threshold)

# Print the results
print(f"Accuracy with threshold {threshold}: {accuracy * 100:.2f}%")
print("Confusion Matrix:\n", cm)
print("Classification Report:\n", report)

Output:

Accuracy with threshold 0.1534: 95.61%
Confusion Matrix:
 [[38  5]
 [ 0 71]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.88      0.94        43
           1       0.93      1.00      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.94      0.95       114
weighted avg       0.96      0.96      0.96       114

/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Here we have used Logistic Regression model to determine the cancer category and adjusted the threshold value to lower the False Negatives. From the above we can see that the quantity of False Negatives is 0.

2. Cost-sensitive Learning

Cost Sensitive Learning is particularly useful when we have imbalanced dataset. In this we give priority to minority classes or in other terms we assign more weights to the minority classes. For instance let us consider the cancer dataset. Here we will first count the cases first and then assign weights.

Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import numpy as np

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Labels (0 = malignant, 1 = benign)

# Count the occurrences of each class (0 = malignant, 1 = benign)
unique, counts = np.unique(y, return_counts=True)
class_distribution = dict(zip(unique, counts))

print(f"Class distribution:\nMalignant (0): {class_distribution[0]}\nBenign (1): {class_distribution[1]}")

# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model with class_weight='balanced' (cost-sensitive learning)
model = LogisticRegression(max_iter=1000, class_weight='balanced')

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Print the results
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:\n", cm)
print("Classification Report:\n", report)

Output:

Class distribution:
Malignant (0): 212
Benign (1): 357
Accuracy: 96.49%
Confusion Matrix:
 [[40  3]
 [ 1 70]]
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114

/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

In this we can see that we have assigned class weight as balanced. In this the model will assign more weights to the classes whose frequency is less.

3. Precision-Recall Trade-off

Precision-Recall trade-off is a method in which we try to strike a balance between the two metrics: precision and recall. In most cases accuracy does not provide the overall analysis of model performance. So basically we use F1 score as well to determine how well our model is working. F1 score is the harmonic mean of Precision and Recall. We do not need to calculate F1 score manually as F1 score is inbuilt in the classification report. We can also plot curves as well.

4. Using ROC Curve and AUC Optimization

ROC or Receiver Operating Characteristic is a curve that is used to distinguish between classes. On the X axis the False Positive Rate is plotted and on Y axis the True Positive Rate. On the other hand, AUC (Area Under the Curve) evaluates the model performance. It is also a probabilistic value and higher the value more better is our model..

Now in this case, if we want to have a perfect AUC score that is 1, we will use hyperparameter tuning and Grid Search technique. By tuning those parameter, we will get the best AUC value.

Python

import warnings
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer, roc_auc_score, roc_curve, classification_report

# Ignore all warnings
warnings.filterwarnings("ignore")

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model
model = LogisticRegression(max_iter=900)

# Set up hyperparameter grid for tuning the 'C' parameter
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100]}

# Define a scorer based on AUC
auc_scorer = make_scorer(roc_auc_score, greater_is_better=True, needs_proba=True)

# Set up grid search to optimize AUC
grid_search = GridSearchCV(model, param_grid, scoring=auc_scorer, cv=5)

# Fit the model
grid_search.fit(X_train, y_train)

# Best model and parameters
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print(f"Best Parameters: {best_params}")

# Predict probabilities and calculate AUC on the test set
y_prob = best_model.predict_proba(X_test)[:, 1]  # Probability of positive class (benign)
auc_score = roc_auc_score(y_test, y_prob)
print(f"Optimized AUC: {auc_score:.2f}")

# Predict class labels for the test set
y_pred = best_model.predict(X_test)

# Generate classification report
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)

# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {auc_score:.2f})', color='blue')
plt.plot([0, 1], [0, 1], color='red', linestyle='--')  # Diagonal line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.grid()
plt.show()

5. Resampling Techniques (Oversampling/Undersampling)

Resampling means quantity of the samples i increased or decreased when our dataset is imbalanced so that our final dataset becomes balanced. There are two techniques for balancing the dataset: Oversampling and Undersampling.

Undersampling: means reduce the quantity of majority classes.
Oversampling: increase the quantity of minority samples by creating synthetic ones.

For oversampling we can use SMOTE and for undersampling we can omit some data randomly.

1. SMOTE

Synthetic Minority Over-sampling Technique is used to generate synthetic samples of the minority class. Here it uses interpolation technique. It is the part of imbalance learn library.

Python

import warnings
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
from imblearn.over_sampling import SMOTE

# Ignore all warnings
warnings.filterwarnings("ignore")

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply SMOTE to the training set
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Create a logistic regression model
model = LogisticRegression(max_iter=1000)

# Train model on resampled data
model.fit(X_resampled, y_resampled)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Output:

onfusion Matrix:
[[41  2]
 [ 1 70]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

2. Random Undersampling

In this method, we randomly remove some datapoints from the majority class so that the overall dataset remains balanced.

Python

import warnings
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from imblearn.under_sampling import RandomUnderSampler

# Ignore all warnings
warnings.filterwarnings("ignore")

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Random Undersampling to the training set
undersampler = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = undersampler.fit_resample(X_train, y_train)

# Train model on resampled data
model = LogisticRegression(max_iter=900)
model.fit(X_resampled, y_resampled)

# Predict and evaluate
y_pred = model.predict(X_test)

# Print confusion matrix and classification report
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Output:

Confusion Matrix:
[[41  2]
 [ 1 70]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

6. Regularization Methods

Overfitting is a scenario where model performs well on training data but performs poorly on the test or unseen data. As a result the Precision as well as Recall gets affected. So we need to regularize some parameters so that our model does not get prone to overfitting.

Decision Trees: If we are using Decision Tree algorithm, we can prune our trees or reduce the max depth.
Support Vector Machines: For SVM algorithm, we can reduce the value of C (hyperparameter) or use different kernels.
Logistic Regression: For Logistic Regression, we can introduce penalties(L1, L2 or elastic net) so that the model performs better.

Below we have implemented Support Vector Machine model with rbf kernel and value of C is set to 1.

Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.svm import SVC
# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Labels (0 = malignant, 1 = benign)

# Split the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model
model = SVC(C=1.0, kernel='rbf')  # Using RBF kernel


# Train the model
model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Print the results
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:\n", cm)
print("Classification Report:\n", report)

Output:

Accuracy: 94.74%
Confusion Matrix:
 [[37  6]
 [ 0 71]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.86      0.93        43
           1       0.92      1.00      0.96        71

    accuracy                           0.95       114
   macro avg       0.96      0.93      0.94       114
weighted avg       0.95      0.95      0.95       114

7. Ensemble Models

Ensemble methods means combining the models and getting the prediction. This is the most popular technique as it is used to improve precision and recall by reducing overfitting. There are two categories of Ensemble Methods.

Bagging: In this each model performs prediction on random subset of data and provides with the predictions. Then all the predicts are combined and based on the voting or mean e get final result.
Boosting: In boosting one model corrects the error of another model and it happens in sequential fashion.

Here we have used Random Forest Classifier (Bagging) and AdaBoost(Boosting) to evaluate the model performance.

Python

import warnings
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score, roc_auc_score, confusion_matrix, classification_report

# Ignore all warnings
warnings.filterwarnings("ignore")

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Bagging with Random Forest Classifier
bagging_model = RandomForestClassifier(
    n_estimators=100,
    random_state=42
)

# Train the Bagging model
bagging_model.fit(X_train, y_train)

# Predict and evaluate Bagging model
y_pred_bagging = bagging_model.predict(X_test)
bagging_accuracy = accuracy_score(y_test, y_pred_bagging)
bagging_auc = roc_auc_score(y_test, bagging_model.predict_proba(X_test)[:, 1])

print("Bagging (Random Forest) Classifier:")
print(f"Accuracy: {bagging_accuracy:.2f}")
print(f"AUC: {bagging_auc:.2f}")
print(confusion_matrix(y_test, y_pred_bagging))
print(classification_report(y_test, y_pred_bagging))

# Boosting with AdaBoost Classifier
boosting_model = AdaBoostClassifier(
    estimator=RandomForestClassifier(n_estimators=10),  # Using Random Forest as base estimator
    n_estimators=100,
    random_state=42
)

# Train the Boosting model
boosting_model.fit(X_train, y_train)

# Predict and evaluate Boosting model
y_pred_boosting = boosting_model.predict(X_test)
boosting_accuracy = accuracy_score(y_test, y_pred_boosting)
boosting_auc = roc_auc_score(y_test, boosting_model.predict_proba(X_test)[:, 1])

print("\nBoosting (AdaBoost) Classifier:")
print(f"Accuracy: {boosting_accuracy:.2f}")
print(f"AUC: {boosting_auc:.2f}")
print(confusion_matrix(y_test, y_pred_boosting))
print(classification_report(y_test, y_pred_boosting))

Output:

8. Post-model Calibration

As we all know any Machine Learning model basically predicts probability or likelihood of any event. So we need to calibrate those probabilities to make it more realistic. There are two ways:

Platt Scaling: In this scaling it basically uses the sigmoid function. It basically feeds the output of the model in the sigmoid function. The function generates the probabilistic values that are more realistic and to predict the class we specify the threshold. Then based on threshold we assign the final class.
Isotonic Regression: In this the output of the classifier is fitted in the non decreasing step function to get final values. The classes are then predicted based on threshold concept.

Python

import warnings
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve, classification_report
from sklearn.calibration import CalibratedClassifierCV

# Ignore warnings
warnings.filterwarnings("ignore")

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Apply Platt scaling
platt_scaling = CalibratedClassifierCV(model, method='sigmoid')  # Platt scaling uses 'sigmoid'
platt_scaling.fit(X_train, y_train)

# Apply Isotonic regression
isotonic_reg = CalibratedClassifierCV(model, method='isotonic')  # Isotonic regression
isotonic_reg.fit(X_train, y_train)

# Predict probabilities with the original, Platt-scaled, and Isotonic-regression models
y_prob_original = model.predict_proba(X_test)[:, 1]
y_prob_platt = platt_scaling.predict_proba(X_test)[:, 1]
y_prob_isotonic = isotonic_reg.predict_proba(X_test)[:, 1]

# Convert probabilities to class predictions using a threshold of 0.5
y_pred_original = (y_prob_original >= 0.5).astype(int)
y_pred_platt = (y_prob_platt >= 0.5).astype(int)
y_pred_isotonic = (y_prob_isotonic >= 0.5).astype(int)

# Print classification reports for each model
print("Classification Report (Original):")
print(classification_report(y_test, y_pred_original))

print("\nClassification Report (Platt Scaling):")
print(classification_report(y_test, y_pred_platt))

print("\nClassification Report (Isotonic Regression):")
print(classification_report(y_test, y_pred_isotonic))

Output:

Classification Report (Original):
              precision    recall  f1-score   support

           0       0.97      0.91      0.94        43
           1       0.95      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114


Classification Report (Platt Scaling):
              precision    recall  f1-score   support

           0       0.97      0.91      0.94        43
           1       0.95      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114


Classification Report (Isotonic Regression):
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

Balancing False Negatives and False Positives

Achieving a balance between minimizing false negatives and false positives requires careful consideration of the specific context and application requirements:

Receiver Operating Characteristic (ROC) Curve: Analyzing ROC curves helps in understanding trade-offs between sensitivity (true positive rate) and specificity (true negative rate). The area under the ROC curve (AUC) provides a single metric for evaluating overall model performance.
F1 Score Optimization: The F1 score is a harmonic mean of precision and recall, providing a balanced measure that considers both false positives and false negatives. Optimizing for F1 score ensures neither error dominates at the expense of overall performance.
Domain-specific Cost Analysis: Understanding the domain-specific costs associated with each type of error is crucial for setting priorities in minimizing them. For example, in healthcare, reducing false negatives may take precedence due to potential life-threatening consequences.

Conclusion

Minimizing false negatives and false positives in binary classification is essential for building reliable models that perform well in real-world applications. By employing strategies such as adjusting decision thresholds, cost-sensitive learning, ensemble methods, precision-recall trade-offs, and model calibration, practitioners can significantly enhance model accuracy and reliability. Ultimately, understanding the specific context and balancing trade-offs between different types of errors will lead to more effective binary classification models tailored to application needs.

baidehi1874

Improve

Article Tags :

Practice Tags :

Machine Learning

Methods to Minimize False Negatives and False Positives in Binary Classification

Methods to Minimize False Negatives

Methods to Minimize False Positives

1. Adjusting the Decision Threshold

2. Cost-sensitive Learning

3. Precision-Recall Trade-off

4. Using ROC Curve and AUC Optimization

5. Resampling Techniques (Oversampling/Undersampling)

6. Regularization Methods

7. Ensemble Models

8. Post-model Calibration

Balancing False Negatives and False Positives

Conclusion

Similar Reads

Introduction to Machine Learning

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advance Machine Learning Technique

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?