Numpy individual element access slower than for lists

Question

In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.

To recap, the NumPy operations you have listed do the following:

b[100][100] returns row 100 of b as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. a np.int64 type).
b[100,100] returns the value at row 100 and column 100 directly (no intermediate array is returned first).
b.item(100,100) does the same as above b[100,100] except that the value is converted to a native Python type and returned.

Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I’ll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).

Why is list access still faster than b[100,100]?

Object creation

Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3] does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.

NumPy arrays are not collections of objects. The array np.array([1, 2, 3]) is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64 may be returned by the indexing operation: this object did not exist previously and had to be created.

Indexing complexity

Two other reasons why a[100][100] (getting from the list) is quicker than b[100,100] (getting from the array) are that:

The bytecode opcode BINARY_SUBSCR is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.
The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.

Below, the steps for accessing elements in a list and array with a[100][100] and b[100,100] are described in more detail.

Bytecode

The same four bytecode opcodes are triggered for both lists and arrays:

  0 LOAD_NAME                0 (a)           # the list or array
  3 LOAD_CONST               0 (100)         # index number (tuple for b[100,100])
  6 BINARY_SUBSCR                            # find correct "getitem" function
  7 RETURN_VALUE                             # value returned from list or array

Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100], you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100] uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.

Finding the correct “getitem” function

The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR opcode:

w = POP();                                            // our index
v = TOP();                                            // our list or NumPy array
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {    // do we have a list and an int?
    /* INLINE: list[int] */
    Py_ssize_t i = PyInt_AsSsize_t(w);
        if (i < 0)
             i += PyList_GET_SIZE(v);
        if (i >= 0 && i < PyList_GET_SIZE(v)) {
             x = PyList_GET_ITEM(v, i);               // call "getitem" for lists
             Py_INCREF(x);
        }
        else
            goto slow_get;
     }
     else
       slow_get:
         x = PyObject_GetItem(v, w);                  // else, call another function
                                                      // to work out what is needed
     Py_DECREF(v);
     Py_DECREF(w);
     SET_TOP(x);
     if (x != NULL) continue;
     break;

This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM. This list can now be accessed at the required index (see next section below).

However, if it doesn’t see a list (e.g. we have a NumPy array), it takes the “slow_get” path. This in turn calls another function PyObject_GetItem to check which “getitem” function the object is mapped to:

PyObject_GetItem(PyObject *o, PyObject *key)
{
    PyMappingMethods *m;

    if (o == NULL || key == NULL)
        return null_error();

    m = o->ob_type->tp_as_mapping;
    if (m && m->mp_subscript)
        return m->mp_subscript(o, key);
    ...

In the case of NumPy arrays, the correct function is located in mp_subscript in the PyMappingMethods structure.

Notice the additional function calls before this correct “get” function can be called. These calls add to the overhead for b[100], although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.

Getting from a Python list

Above it was seen that the function PyList_GET_ITEM is called. This is a short function that essentially looks like this*:

PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {                            // check if list
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {                    // check i is in range
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];           // return reference to object
}

* PyList_GET_ITEM is actually the macro form of this function which does the same thing, minus error checking.

This means that getting the item at index i of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i is in the correct range for the list, and then returns the reference to the object in the list.

Getting from a NumPy array

In contrast, NumPy has to do much more work before the value at the requested index can be returned.

Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c.

Anything used to index NumPy arrays passes through the function prepare_index which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:

NPY_NO_EXPORT int
prepare_index(PyArrayObject *self, PyObject *index,
              npy_index_info *indices,
              int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)

 /* @param the array being indexed
  * @param the index object
  * @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1)
  * @param number of indices found
  * @param dimension of the indexing result
  * @param dimension of the fancy/advanced indices part
  * @param whether to allow the boolean special case 
  */

The function has to do a lot of checks. Even for a relatively simple index such as b[100,100], a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.

In conclusion, it takes longer for the “getitem” function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.