Why is a function/method call in python expensive?

A function call requires that the current execution frame is suspended, and a new frame is created and pushed on the stack. This is relatively expensive, compared to many other operations.

You can measure the exact time required with the timeit module:

>>> import timeit
>>> def f(): pass
... 
>>> timeit.timeit(f)
0.15175890922546387

That’s 1/6th of a second for a million calls to an empty function; you’d compare the time required with whatever you are thinking of putting in a function; the 0.15 second would need to taken into account, if performance is an issue.

Python has a “relatively high” function call overhead, it is the cost we pay for some of Python’s most useful functionality.

Monkey Patching:

You have so much power to monkey-patch/override behavior in Python that the interpreter can’t guarantee that given

 a, b = X(1), X(2)
 return a.fn() + b.fn() + a.fn()

a.fn() and b.fn() are the same, or that a.fn() will be the same after b.fn() has been called.

In [1]: def f(a, b):
   ...:     return a.fn() + b.fn() + c.fn()
   ...:

In [2]: dis.dis(f)
  1           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (fn)
              6 CALL_FUNCTION            0
              9 LOAD_FAST                1 (b)
             12 LOAD_ATTR                0 (fn)
             15 CALL_FUNCTION            0
             18 BINARY_ADD
             19 LOAD_GLOBAL              1 (c)
             22 LOAD_ATTR                0 (fn)
             25 CALL_FUNCTION            0
             28 BINARY_ADD
             29 RETURN_VALUE

In the above, you can see that ‘fn’ is looked up at each location. The same applies to variables, but people seem to be more aware of that.

In [11]: def g(a):
    ...:     return a.i + a.i + a.i
    ...:

In [12]: dis.dis(g)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_ATTR                0 (i)
              6 LOAD_FAST                0 (a)
              9 LOAD_ATTR                0 (i)
             12 BINARY_ADD
             13 LOAD_FAST                0 (a)
             16 LOAD_ATTR                0 (i)
             19 BINARY_ADD
             20 RETURN_VALUE

Worse, because modules can monkey patch/replace themselves/each other, if you were calling a global/module function, the global/module has to be looked up every time:

In [16]: def h():
    ...:     v = numpy.vector(numpy.vector.identity)
    ...:     for i in range(100):
    ...:         v = numpy.vector.add(v, numpy.vector.identity)
    ...:

In [17]: dis.dis(h)
  2           0 LOAD_GLOBAL              0 (numpy)
              3 LOAD_ATTR                1 (vector)
              6 LOAD_GLOBAL              0 (numpy)
              9 LOAD_ATTR                1 (vector)
             12 LOAD_ATTR                2 (identity)
             15 CALL_FUNCTION            1
             18 STORE_FAST               0 (v)

  3          21 SETUP_LOOP              47 (to 71)
             24 LOAD_GLOBAL              3 (range)
             27 LOAD_CONST               1 (100)
             30 CALL_FUNCTION            1
             33 GET_ITER
        >>   34 FOR_ITER                33 (to 70)
             37 STORE_FAST               1 (i)

  4          40 LOAD_GLOBAL              0 (numpy)
             43 LOAD_ATTR                1 (vector)
             46 LOAD_ATTR                4 (add)
             49 LOAD_FAST                0 (v)
             52 LOAD_GLOBAL              0 (numpy)
             55 LOAD_ATTR                1 (vector)
             58 LOAD_ATTR                2 (identity)
             61 CALL_FUNCTION            2
             64 STORE_FAST               0 (v)
             67 JUMP_ABSOLUTE           34
        >>   70 POP_BLOCK
        >>   71 LOAD_CONST               0 (None)
             74 RETURN_VALUE

WORKAROUND

Consider capturing or importing any values you expect not to mutate:

def f1(files):
    for filename in files:
        if os.path.exists(filename):
            yield filename

# vs

def f2(files):
    from os.path import exists
    for filename in files:
        if exists(filename):
            yield filename

# or

def f3(files, exists=os.path.exists):
    for filename in files:
        if exists(filename):
            yield filename

See also the “In the wild” section

It’s not always possible to import, though; for example, you you can import sys.stdin but you can’t import sys.stdin.readline, and numpy types can have similar problems:

In [15]: def h():
    ...:     from numpy import vector
    ...:     add = vector.add
    ...:     idy = vector.identity
    ...:     v   = vector(idy)
    ...:     for i in range(100):
    ...:         v = add(v, idy)
    ...:

In [16]: dis.dis(h)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               2 (('vector',))
              6 IMPORT_NAME              0 (numpy)
              9 IMPORT_FROM              1 (vector)
             12 STORE_FAST               0 (vector)
             15 POP_TOP

  3          16 LOAD_FAST                0 (vector)
             19 LOAD_ATTR                2 (add)
             22 STORE_FAST               1 (add)

  4          25 LOAD_FAST                0 (vector)
             28 LOAD_ATTR                3 (identity)
             31 STORE_FAST               2 (idy)

  5          34 LOAD_FAST                0 (vector)
             37 LOAD_FAST                2 (idy)
             40 CALL_FUNCTION            1
             43 STORE_FAST               3 (v)

  6          46 SETUP_LOOP              35 (to 84)
             49 LOAD_GLOBAL              4 (range)
             52 LOAD_CONST               3 (100)
             55 CALL_FUNCTION            1
             58 GET_ITER
        >>   59 FOR_ITER                21 (to 83)
             62 STORE_FAST               4 (i)

  7          65 LOAD_FAST                1 (add)
             68 LOAD_FAST                3 (v)
             71 LOAD_FAST                2 (idy)
             74 CALL_FUNCTION            2
             77 STORE_FAST               3 (v)
             80 JUMP_ABSOLUTE           59
        >>   83 POP_BLOCK
        >>   84 LOAD_CONST               0 (None)
             87 RETURN_VALUE

CAVEAT EMPTOR:

  • The capture variables is not a zero-cost operation, and it increases the frame size,
  • Only use after identifying hot code paths,

Argument Passing

Python’s argument passing mechanism looks trivial, but unlike most language it costs a lot. We’re talking about separating arguments into args and kwargs:

f(1, 2, 3)
f(1, 2, c=3)
f(c=3)
f(1, 2)  # c is auto-injected

There’s a lot of work that goes on in the CALL_FUNCTION operation, including potentially one transition from the C layer to the Python layer and back.

In addition to this, the parameters often need to be looked up to be passed:

f(obj.x, obj.y, obj.z)

Consider:

In [28]: def fn(obj):
    ...:     f = some.module.function
    ...:     for x in range(1000):
    ...:         for y in range(1000):
    ...:             f(x + obj.x, y + obj.y, obj.z)
    ...:

In [29]: dis.dis(fn)
  2           0 LOAD_GLOBAL              0 (some)
              3 LOAD_ATTR                1 (module)
              6 LOAD_ATTR                2 (function)
              9 STORE_FAST               1 (f)

  3          12 SETUP_LOOP              76 (to 91)
             15 LOAD_GLOBAL              3 (range)
             18 LOAD_CONST               1 (1000)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                62 (to 90)
             28 STORE_FAST               2 (x)

  4          31 SETUP_LOOP              53 (to 87)
             34 LOAD_GLOBAL              3 (range)
             37 LOAD_CONST               1 (1000)
             40 CALL_FUNCTION            1
             43 GET_ITER
        >>   44 FOR_ITER                39 (to 86)
             47 STORE_FAST               3 (y)

  5          50 LOAD_FAST                1 (f)
             53 LOAD_FAST                2 (x)
             56 LOAD_FAST                0 (obj)
             59 LOAD_ATTR                4 (x)
             62 BINARY_ADD
             63 LOAD_FAST                3 (y)
             66 LOAD_FAST                0 (obj)
             69 LOAD_ATTR                5 (y)
             72 BINARY_ADD
             73 LOAD_FAST                0 (obj)
             76 LOAD_ATTR                6 (z)
             79 CALL_FUNCTION            3
             82 POP_TOP
             83 JUMP_ABSOLUTE           44
        >>   86 POP_BLOCK
        >>   87 JUMP_ABSOLUTE           25
        >>   90 POP_BLOCK
        >>   91 LOAD_CONST               0 (None)
             94 RETURN_VALUE

Where “LOAD_GLOBAL” requires that the name be hashed and then the globals table be queried for that hash value. This is an O(log N) operation.

But think about this: for our two, simple, 0-1000 loops, we are doing that a million times…

LOAD_FAST and LOAD_ATTR are also hash table lookups, they’re just restricted to specific hash tables. LOAD_FAST consults the locals() hash table, LOAD_ATTR consults the hash table of the object loaded last…

But also note that we are calling a function there a million times. Fortunately, it’s a built-in function and builtins have a much less prohibitive overhead; but if this was really your perf hotspot, you might want to consider optimizing the overhead of range by doing something like:

x, y = 0, 0
for i in range(1000 * 1000):
    ....
    y += 1
    if y > 1000:
        x, y = x + 1, 0

You could do some hacking on capturing variables, but it’s likely to have minimal perf impact on this code and simply make it less maintainable.

But the core pythonic fix to this problem is to use generators or iterables:

for i in obj.values():
    prepare(i)

# vs

prepare(obj.values())

and

for i in ("left", "right", "up", "down"):
    test_move(i)

# vs

test_move(("left", "right", "up", "down"))

and

for x in range(-1000, 1000):
    for y in range(-1000, 1000):
        fn(x + obj.x, y + obj.y, obj.z)

# vs

def coordinates(obj):
    for x in range(obj.x - 1000, obj.x + 1000 + 1):
        for y in range(obj.y - 1000, obj.y + 1000 + 1):
          yield x, y, z

fn(coordinates(obj))

In the wild

You’ll see these pythopticisms in the wild in forms like this:

def some_fn(a, b, c, stdin=sys.stdin):
    ...

This has several advantages:

  • impacts the help() for this function, (default input is stdin)
  • provides a hook for unit tests,
  • promotes sys.stdin to a local (LOAD_FAST vs LOAD_GLOBAL+LOAD_ATTR)

Most numpy calls either take or have a variant that takes a list, array, etc, and if you’re not using those, you’re probably missing out on 99% of numpy’s benefits.

def distances(target, candidates):
    values = []
    for candidate in candidates:
        values.append(numpy.linalg.norm(candidate - target))
    return numpy.array(values)

# vs

def distances(target, candidates):
    return numpy.linalg.norm(candidates - target)

(Note: this isn’t necessarily the best way to get distances, esp if you are not going to forward the distance value elsewhere; e.g if you’re doing range check, it is probably more efficient to use a more selective approach that avoids using the sqrt operation)

Optimizing for iterables doesn’t just mean passing them, but also returning them

def f4(files, exists=os.path.exists):
    return (filename for filename in files if exists(filename))
           ^- returns a generator expression

Leave a Comment