Fastest pairwise distance metric in python

Neither of the other answers quite answered the question – 1 was in Cython, one was slower. But both provided very useful hints. Following up on them suggests that scipy.spatial.distance.pdist is the way to go.

Here’s some code:

import numpy as np
import random
import sklearn.metrics.pairwise
import scipy.spatial.distance

r = np.array([random.randrange(1, 1000) for _ in range(0, 1000)])
c = r[:, None]

def option1(r):
    dists = np.abs(r - r[:, None])

def option2(r):
    dists = scipy.spatial.distance.pdist(r, 'cityblock')

def option3(r):
    dists = sklearn.metrics.pairwise.manhattan_distances(r)

Timing with IPython:

In [36]: timeit option1(r)
100 loops, best of 3: 5.31 ms per loop

In [37]: timeit option2(c)
1000 loops, best of 3: 1.84 ms per loop

In [38]: timeit option3(c)
100 loops, best of 3: 11.5 ms per loop

I didn’t try the Cython implementation (I can’t use it for this project), but comparing my results to the other answer that did, it looks like scipy.spatial.distance.pdist is roughly a third slower than the Cython implementation (taking into account the different machines by benchmarking on the np.abs solution).

Leave a Comment