How to understand Locality Sensitive Hashing? [closed]

The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets. Check Chapter 3 – Finding Similar Items http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf Also I recommend the below slide: http://www.cs.jhu.edu/%7Evandurme/papers/VanDurmeLallACL10-slides.pdf . The example in the slide helps me a lot in understanding the hashing for cosine similarity. I borrow two slides from Benjamin Van … Read more

Millions of 3D points: How to find the 10 of them closest to a given point?

Million points is a small number. The most straightforward approach works here (code based on KDTree is slower (for querying only one point)). Brute-force approach (time ~1 second) #!/usr/bin/env python import numpy NDIM = 3 # number of dimensions # read points into array a = numpy.fromfile(‘million_3D_points.txt’, sep=’ ‘) a.shape = a.size / NDIM, NDIM … Read more

Nearest neighbors in high-dimensional data?

I currently study such problems — classification, nearest neighbor searching — for music information retrieval. You may be interested in Approximate Nearest Neighbor (ANN) algorithms. The idea is that you allow the algorithm to return sufficiently near neighbors (perhaps not the nearest neighbor); in doing so, you reduce complexity. You mentioned the kd-tree; that is … Read more