Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.

To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.

Use the following snippet to condense the matrix and happily proceed.

import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
    distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j

Please correct me if I am wrong.

Leave a Comment