scipy.cluster.hierarchy.

cophenet#

scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]#

Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.

Suppose p and q are original observations in disjoint clusters s and t, respectively and s and t are joined by a direct parent cluster u. The cophenetic distance between observations i and j is simply the distance between clusters s and t.

Parameters:
Zndarray

The hierarchical clustering encoded as an array (see linkage function).

Yndarray (optional)

Calculates the cophenetic correlation coefficient c of a hierarchical clustering defined by the linkage matrix Z of a set of \(n\) observations in \(m\) dimensions. Y is the condensed distance matrix from which Z was generated.

Returns:
cndarray

The cophentic correlation distance (if Y is passed).

dndarray

The cophenetic distance matrix in condensed form. The \(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).

See also

linkage

for a description of what a linkage matrix is.

scipy.spatial.distance.squareform

transforming condensed matrices into square ones.

Notes

cophenet has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

Library

CPU

GPU

NumPy

n/a

CuPy

n/a

PyTorch

JAX

Dask

⚠️ merges chunks

n/a

See Support for the array API standard for more information.

Examples

>>> from scipy.cluster.hierarchy import single, cophenet
>>> from scipy.spatial.distance import pdist, squareform

Given a dataset X and a linkage matrix Z, the cophenetic distance between two points of X is the distance between the largest two distinct clusters that each of the points:

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

X corresponds to this dataset

x x    x x
x        x

x        x
x x    x x
>>> Z = single(pdist(X))
>>> Z
array([[ 0.,  1.,  1.,  2.],
       [ 2., 12.,  1.,  3.],
       [ 3.,  4.,  1.,  2.],
       [ 5., 14.,  1.,  3.],
       [ 6.,  7.,  1.,  2.],
       [ 8., 16.,  1.,  3.],
       [ 9., 10.,  1.,  2.],
       [11., 18.,  1.,  3.],
       [13., 15.,  2.,  6.],
       [17., 20.,  2.,  9.],
       [19., 21.,  2., 12.]])
>>> cophenet(Z)
array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2.,
       2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])

The output of the scipy.cluster.hierarchy.cophenet method is represented in condensed form. We can use scipy.spatial.distance.squareform to see the output as a regular matrix (where each element ij denotes the cophenetic distance between each i, j pair of points in X):

>>> squareform(cophenet(Z))
array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.],
       [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])

In this example, the cophenetic distance between points on X that are very close (i.e., in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus, the distance between these clusters will be larger.