Why is matrix multiplication faster with numpy than with ctypes in Python?

NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication (see also: ATLAS). The specific function in this case is GEMM (for generic matrix multiplication). You can look up the original by searching for dgemm.f (it’s in Netlib).

The optimization, by the way, goes beyond compiler optimizations. Above, Philip mentioned Coppersmith–Winograd. If I remember correctly, this is the algorithm which is used for most cases of matrix multiplication in ATLAS (though a commenter notes it could be Strassen’s algorithm).

In other words, your matmult algorithm is the trivial implementation. There are faster ways to do the same thing.

Leave a Comment