hierarchical-clustering - w3toppers.com

How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

I think you’re on the right track. Let’s try this: import scipy import scipy.cluster.hierarchy as sch X = scipy.randn(100, 2) # 100 2-dimensional observations d = sch.distance.pdist(X) # vector of (100 choose 2) pairwise distances L = sch.linkage(d, method=’complete’) ind = sch.fcluster(L, 0.5*d.max(), ‘distance’) ind will give you cluster indices for each of the 100 … Read more

Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so. To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering. Use the following snippet … Read more

Text clustering with Levenshtein distances

This may be a bit simplistic, but here’s a code example that uses hierarchical clustering based on Levenshtein distance in R. set.seed(1) rstr <- function(n,k){ # vector of n random char(k) strings sapply(1:n,function(i){do.call(paste0,as.list(sample(letters,k,replace=T)))}) } str<- c(paste0(“aa”,rstr(10,3)),paste0(“bb”,rstr(10,3)),paste0(“cc”,rstr(10,3))) # Levenshtein Distance d <- adist(str) rownames(d) <- str hc <- hclust(as.dist(d)) plot(hc) rect.hclust(hc,k=3) df <- data.frame(str,cutree(hc,k=3)) In this … Read more