Machine Learning Algorithms
Last Updated :
23 Jul, 2025
Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types:
- Supervised Learning: Algorithms learn from labeled data, where the input-output relationship is known.
- Unsupervised Learning: Algorithms work with unlabeled data to identify patterns or groupings.
- Reinforcement Learning: Algorithms learn by interacting with an environment and receiving feedback in the form of rewards or penalties.
.webp)
Supervised Learning Algorithms
Supervised learning algos are trained on datasets where each example is paired with a target or response variable, known as the label. The goal is to learn a mapping function from input data to the corresponding output labels, enabling the model to make accurate predictions on unseen data. Supervised learning problems are generally categorized into two main types: Classification and Regression. Most widely used supervised learning algorithms are:
1. Linear Regression
Linear regression is used to predict a continuous value by finding the best-fit straight line between input (independent variable) and output (dependent variable)
- Minimizes the difference between actual values and predicted values using a method called "least squares" to to best fit the data.
- Predicting a person’s weight based on their height or predicting house prices based on size.
2. Logistic Regression
Logistic regression predicts probabilities and assigns data points to binary classes (e.g., spam or not spam).
- It uses a logistic function (S-shaped curve) to model the relationship between input features and class probabilities.
- Used for classification tasks (binary or multi-class).
- Outputs probabilities to classify data into categories.
- Example : Predicting whether a customer will buy a product online (yes/no) or diagnosing if a person has a disease (sick/not sick).
Note : Despite its name, logistic regression is used for classification tasks, not regression.
3. Decision Trees
A decision tree splits data into branches based on feature values, creating a tree-like structure.
- Each decision node represents a feature; leaf nodes provide the final prediction.
- The process continues until a final prediction is made at the leaf nodes
- Works for both classification and regression tasks.
For more decision tree algorithms, you can explore:
4. Support Vector Machines (SVM)
SVMs find the best boundary (called a hyperplane) that separates data points into different classes.
- Uses support vectors (critical data points) to define the hyperplane.
- Can handle linear and non-linear problems using kernel functions.
- focuses on maximizing the margin between classes, making it robust for high-dimensional data or complex patterns.
5. k-Nearest Neighbors (k-NN)
KNN is a simple algorithm that predicts the output for a new data point based on the similarity (distance) to its nearest neighbors in the training dataset, used for both classification and regression tasks.
- Calculates distance between point with existing data points in training dataset using a distance metric (e.g., Euclidean, Manhattan, Minkowski)
- identifies k nearest neighbors to new data point based on the calculated distances.
- For classification, algorithm assigns class label that is most common among its k nearest neighbors.
- For regression, the algorithm predicts the value as the average of the values of its k nearest neighbors.
6. Naive Bayes
Based on Bayes' theorem and assumes all features are independent of each other (hence "naive")
- Calculates probabilities for each class and assigns the most likely class to a data point.
- Assumption of feature independence might not hold in all cases ( rarely true in real-world data )
- Works well for high-dimensional data.
- Commonly used in text classification tasks like spam filtering : Naive Bayes
7. Random Forest
Random forest is an ensemble method that combines multiple decision trees.
- Uses random sampling and feature selection for diversity among trees.
- Final prediction is based on majority voting (classification) or averaging (regression).
- Advantages : reduces overfitting compared to individual decision trees.
- Handles large datasets with higher dimensionality.
For in-depth understanding : What is Ensemble Learning? - Two types of ensemble methods in ML
7. Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)
These algorithms build models sequentially, meaning each new model corrects errors made by previous ones. Combines weak learners (like decision trees) to create a strong predictive model. Effective for both regression and classification tasks. : Gradient Boosting in ML
- XGBoost (Extreme Gradient Boosting) : Advanced version of Gradient Boosting that includes regularization to prevent overfitting. Faster than traditional Gradient Boosting, for large datasets.
- LightGBM (Light Gradient Boosting Machine): Uses a histogram-based approach for faster computation and supports categorical features natively.
- CatBoost: Designed specifically for categorical data, with built-in encoding techniques. Uses symmetric trees for faster training and better generalization.
For more ensemble learning and gradient boosting approaches, explore:
8. Neural Networks ( Including Multilayer Perceptron)
Neural Networks, including Multilayer Perceptrons (MLPs), are considered part of supervised machine learning algorithms as they require labeled data to train and learn the relationship between input and desired output; network learns to minimize the error using backpropagation algorithm to adjust weights during training.
- Multilayer Perceptron (MLP): Neural network with multiple layers of nodes.
- Used for both classification and regression ( Examples: image classification, spam detection, and predicting numerical values like stock prices or house prices)
For in-depth understanding : Supervised multi-layer perceptron model - What is perceptron?
Unsupervised Learning Algorithms
Unsupervised learning algos works with unlabeled data to discover hidden patterns or structures without predefined outputs. These are again divided into three main categories based on their purpose: Clustering, Association Rule Mining, and Dimensionality Reduction. First we'll see algorithms for Clustering, then dimensionality reduction and at last association.
1. Clustering
Clustering algorithms group data points into clusters based on their similarities or differences. The goal is to identify natural groupings in the data. Clustering algorithms are divided into multiple types based on the methods they use to group data. These types include Centroid-based methods, Distribution-based methods, Connectivity-based methods, and Density-based methods. For resources and in-depth understanding, go through the links below.
- Centroid-based Methods: Represent clusters using central points, such as centroids or medoids.
- Distribution-based Methods
- Connectivity based methods
- Density Based methods
2. Dimensionality Reduction
Dimensionality reduction is used to simplify datasets by reducing the number of features while retaining the most important information.
3. Association Rule
Find patterns (called association rules) between items in large datasets, typically in market basket analysis (e.g., finding that people who buy bread often buy butter). It identifies patterns based solely on the frequency of item occurrences and co-occurrences in the dataset.
Reinforcement Learning Algorithms
Reinforcement learning involves training agents to make a sequence of decisions by rewarding them for good actions and penalizing them for bad ones. Broadly categorized into Model-Based and Model-Free methods, these approaches differ in how they interact with the environment.
1. Model-Based Methods
These methods use a model of the environment to predict outcomes and help the agent plan actions by simulating potential results.
2. Model-Free Methods
These methods do not build or rely on an explicit model of the environment. Instead, the agent learns directly from experience by interacting with the environment and adjusting its actions based on feedback. Model-Free methods can be further divided into Value-Based and Policy-Based methods:
- Value-Based Methods: Focus on learning the value of different states or actions, where the agent estimates the expected return from each action and selects the one with the highest value.
- Policy-based Methods: Directly learn a policy (a mapping from states to actions) without estimating values where the agent continuously adjusts its policy to maximize rewards.
Discover the Top 15 Machine Learning Algorithms for Interview Preparation.
Overview of Machine Learning
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas (stands for Python Data Analysis) is an open-source software library designed for data manipulation and analysis. Revolves around two primary Data structures: Series (1D) and DataFrame (2D)Built on top of NumPy, efficiently manages large datasets, offering tools for data cleaning, transformat
6 min read
NumPy Tutorial - Python LibraryNumPy is a core Python library for numerical computing, built for handling large arrays and matrices efficiently.ndarray object â Stores homogeneous data in n-dimensional arrays for fast processing.Vectorized operations â Perform element-wise calculations without explicit loops.Broadcasting â Apply
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice