numpy: most efficient frequency counts for unique values in an array

As of Numpy 1.9, the easiest and fastest method is to simply use numpy.unique, which now has a return_counts keyword argument:

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print np.asarray((unique, counts)).T

Which gives:

 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

A quick comparison with scipy.stats.itemfreq:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

More Related Contents:

Performance of Pandas apply vs np.vectorize to create new column from existing columns
Why is numpy’s einsum faster than numpy’s built in functions?
Numpy sum elements in array based on its value
Efficiently return the index of the first value satisfying condition in array
Frequency counts for unique values in a NumPy array
Most efficient way to forward-fill NaN values in numpy array
Numpy: Fix array with rows of different lengths by filling the empty elements with zeros
Why is a `for` over a Python list faster than over a Numpy array?
Fastest save and load options for a numpy array
Why is Numpy much faster at creating a Zero array compared to replacing the values of an existing array with zeros?
Efficient dot products of large memory-mapped arrays
Creating a numpy array of 3D coordinates from three 1D arrays
How to print the full NumPy array, without truncation?
Working with big data in python and numpy, not enough ram, how to save partial results on disc?
How to apply a disc shaped mask to a NumPy array?
How to remove specific elements in a numpy array
Get the position of the largest value in a multi-dimensional NumPy array
Using Numpy Vectorize on Functions that Return Vectors
How to extend an array in-place in Numpy?
Why is numpy.array so slow?
Efficient evaluation of a function at every cell of a NumPy array
Numpy individual element access slower than for lists
How to turn a video into numpy array?
Can numpy bincount work with 2D arrays?
Sort invariant for numpy.argsort with multiple dimensions
Numpy isnan() fails on an array of floats (from pandas dataframe apply)
Efficiently replace elements in array based on dictionary – NumPy / Python
Randomly select from numpy array
How to “scale” a numpy array?
Numpy `ValueError: operands could not be broadcast together with shape …`

More Related Contents:

Leave a Comment Cancel reply