How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

I think you’re on the right track. Let’s try this: import scipy import scipy.cluster.hierarchy as sch X = scipy.randn(100, 2) # 100 2-dimensional observations d = sch.distance.pdist(X) # vector of (100 choose 2) pairwise distances L = sch.linkage(d, method=’complete’) ind = sch.fcluster(L, 0.5*d.max(), ‘distance’) ind will give you cluster indices for each of the 100 … Read more

Text clustering with Levenshtein distances

This may be a bit simplistic, but here’s a code example that uses hierarchical clustering based on Levenshtein distance in R. set.seed(1) rstr <- function(n,k){ # vector of n random char(k) strings sapply(1:n,function(i){do.call(paste0,as.list(sample(letters,k,replace=T)))}) } str<- c(paste0(“aa”,rstr(10,3)),paste0(“bb”,rstr(10,3)),paste0(“cc”,rstr(10,3))) # Levenshtein Distance d <- adist(str) rownames(d) <- str hc <- hclust(as.dist(d)) plot(hc) rect.hclust(hc,k=3) df <- data.frame(str,cutree(hc,k=3)) In this … Read more