fclusterdata#
- scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]#
Cluster observation data using a given metric.
Clusters the original observations in the n-by-m data matrix X (n observations in m dimensions), using the euclidean distance metric to calculate distances between original observations, performs hierarchical clustering using the single linkage algorithm, and forms flat clusters using the inconsistency method with t as the cut-off threshold.
A 1-D array
T
of lengthn
is returned.T[i]
is the index of the flat cluster to which the original observationi
belongs.- Parameters:
- X(N, M) ndarray
N by M data matrix with N observations in M dimensions.
- tscalar
- For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’,
this is the threshold to apply when forming flat clusters.
- For ‘maxclust’ or ‘maxclust_monocrit’ criteria,
this would be max number of clusters requested.
- criterionstr, optional
Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See
fcluster
for descriptions.- metricstr or function, optional
The distance metric for calculating pairwise distances. See
distance.pdist
for descriptions and linkage to verify compatibility with the linkage method.- depthint, optional
The maximum depth for the inconsistency calculation. See
inconsistent
for more information.- methodstr, optional
The linkage method to use (single, complete, average, weighted, median centroid, ward). See
linkage
for more information. Default is “single”.- Rndarray, optional
The inconsistency matrix. It will be computed if necessary if it is not passed.
- Returns:
- fclusterdatandarray
A vector of length n. T[i] is the flat cluster number to which original observation i belongs.
See also
scipy.spatial.distance.pdist
pairwise distance metrics
Notes
This function is similar to the MATLAB function
clusterdata
.fclusterdata
has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variableSCIPY_ARRAY_API=1
and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.Library
CPU
GPU
NumPy
✅
n/a
CuPy
n/a
⛔
PyTorch
✅
⛔
JAX
⚠️ no JIT
⛔
Dask
⚠️ computes graph
n/a
See Support for the array API standard for more information.
Examples
>>> from scipy.cluster.hierarchy import fclusterdata
This is a convenience method that abstracts all the steps to perform in a typical SciPy’s hierarchical clustering workflow.
Transform the input data into a condensed matrix with
scipy.spatial.distance.pdist
.Apply a clustering method.
Obtain flat clusters at a user defined distance threshold
t
usingscipy.cluster.hierarchy.fcluster
.
>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
>>> fclusterdata(X, t=1) array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)
The output here (for the dataset
X
, distance thresholdt
, and the default settings) is four clusters with three data points each.