is_valid_linkage#
- scipy.cluster.hierarchy.is_valid_linkage(Z, warning=False, throw=False, name=None)[source]#
Check the validity of a linkage matrix.
A linkage matrix is valid if it is a 2-D array (type double) with \(n\) rows and 4 columns. The first two columns must contain indices between 0 and \(2n-1\). For a given row
i
, the following two expressions have to hold:\[0 \leq \mathtt{Z[i,0]} \leq i+n-1 0 \leq Z[i,1] \leq i+n-1\]I.e., a cluster cannot join another cluster unless the cluster being joined has been generated.
The fourth column of Z represents the number of original observations in a cluster, so a valid
Z[i, 3]
value may not exceed the number of original observations.- Parameters:
- Zarray_like
Linkage matrix.
- warningbool, optional
When True, issues a Python warning if the linkage matrix passed is invalid.
- throwbool, optional
When True, throws a Python exception if the linkage matrix passed is invalid.
- namestr, optional
This string refers to the variable name of the invalid linkage matrix.
- Returns:
- bbool
True if the inconsistency matrix is valid; False otherwise.
See also
linkage
for a description of what a linkage matrix is.
Notes
Array API support (experimental): If the input is a lazy Array (e.g. Dask or JAX), the return value may be a 0-dimensional bool Array. When warning=True or throw=True, calling this function materializes the array.
is_valid_linkage
has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variableSCIPY_ARRAY_API=1
and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.Library
CPU
GPU
NumPy
✅
n/a
CuPy
n/a
✅
PyTorch
✅
✅
JAX
⚠️ see notes
⚠️ see notes
Dask
⚠️ see notes
n/a
See Support for the array API standard for more information.
Examples
>>> from scipy.cluster.hierarchy import ward, is_valid_linkage >>> from scipy.spatial.distance import pdist
All linkage matrices generated by the clustering methods in this module will be valid (i.e., they will have the appropriate dimensions and the two required expressions will hold for all the rows).
We can check this using
scipy.cluster.hierarchy.is_valid_linkage
:>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
>>> Z = ward(pdist(X)) >>> Z array([[ 0. , 1. , 1. , 2. ], [ 3. , 4. , 1. , 2. ], [ 6. , 7. , 1. , 2. ], [ 9. , 10. , 1. , 2. ], [ 2. , 12. , 1.29099445, 3. ], [ 5. , 13. , 1.29099445, 3. ], [ 8. , 14. , 1.29099445, 3. ], [11. , 15. , 1.29099445, 3. ], [16. , 17. , 5.77350269, 6. ], [18. , 19. , 5.77350269, 6. ], [20. , 21. , 8.16496581, 12. ]]) >>> is_valid_linkage(Z) True
However, if we create a linkage matrix in a wrong way - or if we modify a valid one in a way that any of the required expressions don’t hold anymore, then the check will fail:
>>> Z[3][1] = 20 # the cluster number 20 is not defined at this point >>> is_valid_linkage(Z) False