Speaking of speed-up, your function will be more cache-friendly if you swap the order of the k
and j
loop iterations:
matrix mult_std(matrix a, matrix b) {
matrix c(a.dim(), false, false);
for (int i = 0; i < a.dim(); i++)
for (int k = 0; k < a.dim(); k++)
for (int j = 0; j < a.dim(); j++) // swapped order
c(i,j) += a(i,k) * b(k,j);
return c;
}
That’s because a k
index on the inner-most loop will cause a cache miss in b
on every iteration. With j
as the inner-most index, both c
and b
are accessed contiguously, while a
stays put.