Open In App

Ordering Points To Identify Cluster Structure (OPTICS) using Sklearn

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

OPTICS (Ordering Points To Identify the Clustering Structure) is a clustering algorithm used to find clusters of different shapes and densities in a dataset. It works like DBSCAN but gives better results when data has clusters with varying densities.

Why we use OPTICS instead of DBSCAN?

  • DBSCAN needs a fixed eps which may not work well if some clusters are tight and others are loose.
  • OPTICS doesn’t force you to set a global distance. It gives a reachability plot and clusters can be extracted from it at different levels.
  • OPTICS handles datasets with varying densities better and identify both dense and sparse clusters in one go.
  • It provides more detailed cluster structure information making it easier to explore data visually and decide the best cut-off points for clusters.

In this article, we will learn how to implement it in Python.

Step 1: Importing Libraries

We will import all the necessary libraries like Matplotlib , numpy and scikit-learn.

Python
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import OPTICS, cluster_optics_dbscan

Step 2: Creating Sample Data

We generate 6 different groups of points (clusters) each in a different location and with different densities. All groups are combined into one big dataset X_modified.

Python
np.random.seed(42)
n_points_per_cluster = 200

C1 = np.array([-3, -1]) + 1.0 * np.random.randn(n_points_per_cluster, 2)
C2 = np.array([2, -2]) + 0.5 * np.random.randn(n_points_per_cluster, 2)
C3 = np.array([0, 2]) + 0.8 * np.random.randn(n_points_per_cluster, 2)
C4 = np.array([-1, 4]) + 0.2 * np.random.randn(n_points_per_cluster, 2)
C5 = np.array([1, -3]) + 1.2 * np.random.randn(n_points_per_cluster, 2)
C6 = np.array([4, 5]) + 1.5 * np.random.randn(n_points_per_cluster, 2)

X_modified = np.vstack((C1, C2, C3, C4, C5, C6))

Step 3: Apply OPTICS Clustering

Now we will apply OPTICS Clustering

  • min_samples=40: Minimum number of points to form a dense region.
  • xi=0.1 helps in detecting changes in cluster density.
  • min_cluster_size=0.1: Minimum size of clusters as fraction of dataset.
Python
clust = OPTICS(min_samples=40, xi=0.1, min_cluster_size=0.1)
clust.fit(X_modified)

Output:

Optics
OPTICS Model

Step 4: Extract Clusters Using DBSCAN Logic

These labels define clusters based on different eps or distance thresholds.

  • eps=0.7 finds smaller or tighter groups.
  • eps=1.5 finds larger or broader groups.
Python
labels_050 = cluster_optics_dbscan(
    reachability=clust.reachability_,
    core_distances=clust.core_distances_,
    ordering=clust.ordering_,
    eps=0.7 
)

labels_200 = cluster_optics_dbscan(
    reachability=clust.reachability_,
    core_distances=clust.core_distances_,
    ordering=clust.ordering_,
    eps=1.5 
)

Step 5: Prepare Values for Plotting

These help us plot how reachable each point is from others.

Python
space = np.arange(len(X_modified))
reachability = clust.reachability_[clust.ordering_]
labels = clust.labels_[clust.ordering_]

Step 6: Plotting the Results

Finally all the results are visualized in four subplots:

  • The reachability plot visualizes density-based clustering where valleys indicate clusters and peaks suggest noise or separations.
  • The bottom-left plot (OPTICS Clustering) shows automatically detected clusters based on density variations.
  • The middle plot (DBSCAN, eps=0.7) extracts smaller and tight clusters.
  • The right plot (DBSCAN, eps=1.5) merges clusters into broader groups.
Python
space = np.arange(len(X_modified))
reachability = clust.reachability_[clust.ordering_]
labels = clust.labels_[clust.ordering_]

plt.figure(figsize=(10, 7))
G = gridspec.GridSpec(2, 3)
ax1 = plt.subplot(G[0, :])
ax2 = plt.subplot(G[1, 0])
ax3 = plt.subplot(G[1, 1])
ax4 = plt.subplot(G[1, 2])

# Reachability Plot
colors = ["b.", "g.", "r.", "y.", "c."]
for klass, color in zip(range(0, 5), colors):
    Xk = space[labels == klass]
    Rk = reachability[labels == klass]
    ax1.plot(Xk, Rk, color, alpha=0.3)
ax1.plot(space[labels == -1], reachability[labels == -1], "k.", alpha=0.3)
ax1.plot(space, np.full_like(space, 1.5, dtype=float), "k-", alpha=0.5)
ax1.plot(space, np.full_like(space, 0.8, dtype=float), "k-.", alpha=0.5)
ax1.set_ylabel("Reachability (epsilon distance)")
ax1.set_title("Reachability Plot")

# OPTICS Clustering Result
colors = ["b.", "g.", "r.", "y.", "c."]
for klass, color in zip(range(0, 5), colors):
    Xk = X_modified[clust.labels_ == klass]
    ax2.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax2.plot(X_modified[clust.labels_ == -1, 0], X_modified[clust.labels_ == -1, 1], "k+", alpha=0.1)
ax2.set_title("Automatic Clustering\nOPTICS")

# DBSCAN Result at eps = 0.7
colors = ["b.", "g.", "r.", "c."]
for klass, color in zip(range(0, 4), colors):
    Xk = X_modified[labels_050 == klass]
    ax3.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax3.plot(X_modified[labels_050 == -1, 0], X_modified[labels_050 == -1, 1], "k+", alpha=0.1)
ax3.set_title("Clustering at 0.7 epsilon cut\nDBSCAN")

# DBSCAN Result at eps = 1.5
colors = ["b.", "m.", "y.", "c."]
for klass, color in zip(range(0, 4), colors):
    Xk = X_modified[labels_200 == klass]
    ax4.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)
ax4.plot(X_modified[labels_200 == -1, 0], X_modified[labels_200 == -1, 1], "k+", alpha=0.1)
ax4.set_title("Clustering at 1.5 epsilon cut\nDBSCAN")

plt.tight_layout()
plt.show()

Output:

optics-

This comparison highlights OPTICS ability to detect clusters of varying densities while DBSCAN requires an appropriate epsilon value to segment data effectively. This visualization gives better insights for understand data's structure and identifying clusters and sparse regions.

To download complete code : Click here


Similar Reads