Blind source separation using FastICA in Scikit Learn
Last Updated :
23 Jul, 2025
Independent Component Analysis(ICA) is a method used for separating mixed signals into their original statistically independent components. FastICA is a widely used efficient algorithm for solving this problem especially in Blind Source Separation where the goal is to recover unknown source signals from observed mixtures. It is commonly applied in fields like audio processing, medical imaging and financial data analysis. FastICA is of two types.
- Deflation-based FastICA where the components are found in one by one manner.
- Symmetric FastICA where the components are found simultaneously.
FastICA can also work with different nonlinearity function and optimize the extraction order in the deflation-based version.
Blind Source Separation
Blind Source Separation (BSS) refers to the process of separating signals when:
- The source signals are unknown.
- The method of mixing is also unknown.
Even without knowing much about the signals or how they were mixed we can separate them using FastICA. This is useful in many areas like sound processing, medical diagnostics and more.
Mathematical Explanation of FastICA Algorithm
Let there be n original source signals combined linearly into m observed mixed signals. The original signals (sources) are represented as a vector:
s = \left( s_1, s_2, \ldots, s_n \right)^T.
The observed mixed signals are:
x = \left( x_1, x_2, \ldots, x_m \right)^T
The mixing process is modeled as:
x=Gs
Where G
is m×n matrix of mixing coefficients. To find an unmixing matrix \mathbf{U} such that:
y=Ux
where \mathbf{y} approximates the original independent sources \mathbf{s}.
Step 1: Center the Data
Centering means making each observed signal zero-mean.
\tilde{\mathbf{x}} = \mathbf{x} - \mathbb{E}[\mathbf{x}]
where \mathbb{E}[\mathbf{x}] is the mean vector (mean of each signal).
Step 2: Whiten the Data
Whitening removes correlations between components and sets their variances to 1. The covariance matrix of whitened data is the identity matrix. Compute covariance matrix of centered data:
\mathbf{C} = \mathbb{E}[\tilde{\mathbf{x}} \tilde{\mathbf{x}}^T]
Perform eigenvalue decomposition:
\mathbf{C} = \mathbf{E} \mathbf{D} \mathbf{E}^T
where:
- \mathbf{E} is the matrix of eigenvectors,
- \mathbf{D} is the diagonal matrix of eigenvalues.
- The whitening matrix \mathbf{V} is:
\mathbf{V} = \mathbf{D}^{-\frac{1}{2}} \mathbf{E}^T
Apply whitening:
\mathbf{z} = \mathbf{V} \tilde{\mathbf{x}}
Now \mathbf{z} has covariance:
\mathbb{E}[\mathbf{z} \mathbf{z}^T] = \mathbf{I}
Whitening simplifies the problem because the independent components now lie on an uncorrelated unit sphere.
Step 3: Estimate Independent Components Using Fixed-Point Iteration
The key insight in FastICA is to find vectors \mathbf{w} such that the projection \mathbf{w}^T \mathbf{z} is maximally non-Gaussian. Define a nonlinear function g(\cdot) and its derivative g'(\cdot) which help measure non-Gaussianity. Common choices:
g(u)=tanh(u), \quad g'(u) = 1 - \tanh^2(u)
The iteration to update \mathbf{w} is:
\mathbf{w}^{\text{new}} = \mathbb{E}[\mathbf{z} g(\mathbf{w}^T \mathbf{z})] - \mathbb{E}[g'(\mathbf{w}^T \mathbf{z})] \mathbf{w}
Normalize:
\mathbf{w}^{\text{new}} \leftarrow \frac{\mathbf{w}^{\text{new}}}{\|\mathbf{w}^{\text{new}}\|}
Repeat until convergence.
Step 4: Deflation for multiple components.
To find multiple independent components \mathbf{w}_1, \mathbf{w}_2, ..., \mathbf{w}_n after estimating \mathbf{w}_p orthogonalize it with respect to previously found vectors:
\mathbf{w}_p \leftarrow \mathbf{w}_p - \sum_{j=1}^{p-1} (\mathbf{w}_p^T \mathbf{w}_j) \mathbf{w}_j
Normalize again:
\mathbf{w}_p \leftarrow \frac{\mathbf{w}_p}{\|\mathbf{w}_p\|}
Python Implementation of FastICA
Now lets implement it step by step:
Step 1: Import Required Libraries
we will import some python libraries like NumPy, Matplotlib and Scikit learn we can perform complex computations easily and effectively.
Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA
Step 2: Generate Source Signals (Sine, Square, Noise)
In this step we create original signals that will act as the sources we want to later separate or analyze. These are basic signal types commonly used in signal processing:
- s1: Smooth periodic signal
- s2: Sharp square wave (high kurtosis)
- s3: Random Gaussian noise
- S: Shape=(2000,3) each column is one independent source
- X= observed signals (mixed)
Python
np.random.seed(0)
n_samples = 2000
time = np.linspace(0, 8, n_samples)
s1 = np.sin(2 * time)
s2 = np.sign(np.sin(3 * time))
s3 = np.random.normal(0, 1, n_samples)
S = np.c_[s1, s2, s3]
A = np.array([
[1, 1, 1],
[0.5, 2, 1.0],
[1.5, 1.0, 2.0]
])
X = np.dot(S, A.T)
Step 3: Apply FastICA to Recover the Signals
Now we will compute ICA model using FastICA and also as given earlier we will also compute PCA model for showing the comparison.
- fit_transform(X) performs centering, whitening and fixed-point iteration
- S_estimated: Approximates original signals (shape: 2000 x 3)
- A_estimated: Estimated mixing matrix
Python
ica = FastICA(n_components=3)
S_estimated = ica.fit_transform(X)
A_estimated = ica.mixing_
Step 4: Plot the Results (Original, Mixed, Recovered)
Now we will plot the graph with our achieved values and can under stand the efficiency of ICA for blind source separation of signals as well as PCA as it failed to do this.
Python
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.title("Original Source Signals")
plt.plot(S)
plt.xlabel("Samples")
plt.subplot(3, 1, 2)
plt.title("Mixed Signals (Observed)")
plt.plot(X)
plt.xlabel("Samples")
plt.subplot(3, 1, 3)
plt.title("Recovered Signals (After ICA)")
plt.plot(S_estimated)
plt.xlabel("Samples")
plt.tight_layout()
plt.show()
Output:
Blind source separationThe output shows three stages of signal processing.
- In the first plot we can see the original source signals i.e a smooth sine wave, a square-shaped signal and some random noise.
- The second plot shows them mixed together making it hard to tell them apart.
- The third plot shows the signals separated again using FastICA closely matching the originals.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice