NumPy grouping using itertools.groupby performance

I get a three times improvement doing something like this:

def group():
    import numpy as np
    values = np.array(np.random.randint(0, 3298, size=35000000), dtype="u4")
    values.sort()
    dif = np.ones(values.shape, values.dtype)
    dif[1:] = np.diff(values)
    idx = np.where(dif>0)
    vals = values[idx]
    count = np.diff(idx)

Leave a Comment