How to speed up matrix multiplication in C++?

Speaking of speed-up, your function will be more cache-friendly if you swap the order of the k and j loop iterations:

matrix mult_std(matrix a, matrix b) {
 matrix c(a.dim(), false, false);
 for (int i = 0; i < a.dim(); i++)
  for (int k = 0; k < a.dim(); k++)
   for (int j = 0; j < a.dim(); j++)  // swapped order
    c(i,j) += a(i,k) * b(k,j);

 return c;
}

That’s because a k index on the inner-most loop will cause a cache miss in b on every iteration. With j as the inner-most index, both c and b are accessed contiguously, while a stays put.

Leave a Comment