How to get element-wise matrix multiplication (Hadamard product) in numpy?

For elementwise multiplication of matrix objects, you can use numpy.multiply: import numpy as np a = np.array([[1,2],[3,4]]) b = np.array([[5,6],[7,8]]) np.multiply(a,b) Result array([[ 5, 12], [21, 32]]) However, you should really use array instead of matrix. matrix objects have all sorts of horrible incompatibilities with regular ndarrays. With ndarrays, you can just use * for … Read more

How to get faster code than numpy.dot for matrix multiplication?

np.dot dispatches to BLAS when NumPy has been compiled to use BLAS, a BLAS implementation is available at run-time, your data has one of the dtypes float32, float64, complex32 or complex64, and the data is suitably aligned in memory. Otherwise, it defaults to using its own, slow, matrix multiplication routine. Checking your BLAS linkage is … Read more

Why list comprehension is much faster than numpy for multiplying arrays?

Creation of numpy arrays is much slower than creation of lists: In [153]: %timeit a = [[2,3,5],[3,6,2],[1,3,2]] 1000000 loops, best of 3: 308 ns per loop In [154]: %timeit a = np.array([[2,3,5],[3,6,2],[1,3,2]]) 100000 loops, best of 3: 2.27 µs per loop There can also fixed costs incurred by NumPy function calls before the meat of … Read more

Why is matrix multiplication faster with numpy than with ctypes in Python?

NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication (see also: ATLAS). The specific function in this case is GEMM (for generic matrix multiplication). You can look up the original by searching for dgemm.f (it’s in Netlib). The optimization, by the way, goes beyond compiler optimizations. Above, Philip mentioned Coppersmith–Winograd. If I remember correctly, … Read more

Minimizing overhead due to the large number of Numpy dot calls

It depends on the size of the matrices Edit For larger nxn matrices (aprox. size 20) a BLAS call from compiled code is faster, for smaller matrices custom Numba or Cython Kernels are usually faster. The following method generates custom dot- functions for given input shapes. With this method it is also possible to benefit … Read more

Efficient 4×4 matrix vector multiplication with SSE: horizontal add and dot product – what’s the point?

Horizontal add and dot product instructions are complex: they are decomposed into multiple simpler microoperations which are executed by processor just like simple instructions. The exact decomposition of horizontal add and dot product instructions into microoperations is processor-specific, but for recent Intel processors horizontal add is decomposed into 2 SHUFFLE + 1 ADD microoperations, and … Read more