Sending columns of a matrix using MPI_Scatter

There’s a long description of this issue in my answer to this question: the fact that many people have these questions is proof that it’s not obvious and the ideas take some getting used to.

The important thing to know is what memory layout the MPI datatype describes. The calling sequence to MPI_Type_vector is:

int MPI_Type_vector(int count,
                   int blocklength,
                   int stride, 
                   MPI_Datatype old_type,
                   MPI_Datatype *newtype_p)

It creates a new type which describes a layout of memory where every stride items, there is a block of blocklength items pulled out, and a total of count of these blocks. Items here are in units of whatever the old_type was. So for instance, if you called (naming the parameters here, which you can’t actually do in C, but:)

 MPI_Type_vector(count=3, blocklength=2, stride=5, old_type=MPI_INT, &newtype);

Then newtype would describe a layout in memory like this:

   |<----->|  block length

   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
   | X | X |   |   |   | X | X |   |   |   | X | X |   |   |   |
   +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

   |<---- stride ----->|

   count = 3

where each square is one integer-sized chunk of memory, presumably 4 bytes. Note that the stride is the distance in integers from the start of one block to the start of the next, not the distance between blocks.

Ok, so in your case you called

  MPI_Type_vector(N, 1, N, MPI_DOUBLE, &col);

which will take count = N blocks, each of size blocklength=1 MPI_DOUBLEs, with a space between the start of each block of stride=N MPI_DOUBLEs. In other words, it’ll take every N’th double, a total of N times; perfect for extracting one column out of a (contiguously-stored) NxN array of doubles. A handy check is to see how much data is being strided over (count*stride = N*N which is the full size of the matrix, check) and how much data is actually included (count*blocksize = N, which is the size of a column, check.)

If all you had to do was call MPI_Send and MPI_Recv to exchange individual columns, you’d be done; you could use this type to describe the layout of the column and you’d be fine. But there’s one more thing.

You want to call MPI_Scatter, which sends the first coltype (say) to processor 0, the next coltype to processor 1, etc. If you’re doing that with a simple 1d array, it’s easy to figure out where the “next” data type is; if you’re scattering 1 int to each processor, the “next” int starts immediately after the first int ends.

But your new coltype column has a total extent that goes from the start of the column to N*N MPI_DOUBLEs later — if MPI_Scatter follows the same logic (it does), it would start looking for the “next” column outside of the matrices memory entirely, and so on with the next and next. Not only would you not get the answer you wanted, the program would likely crash.

The way to fix this is to tell MPI that the “size” of this data type for the purposes of calculating where the “next” one lies is the size in memory between where one column starts and the next column starts; that is, exactly one MPI_DOUBLE. This doesn’t effect the amount of data sent, which is still 1 columns worth of data; it only affects the “next one in line” calculation. With columns (or rows) in an array, you can just sent this size to be the appropriate step size in memory, and MPI will choose the correct next column to send. Without this resize operator, your program would likely crash.

When you have more complicated data layouts, like in the 2d-blocks of a 2d-array example linked to above, then there is no one single step size between “next” items; you still need to do the resizing trick to make the size be some useful unit, but then you need to use MPI_Scatterv rather than scatter to explicitly specify the locations to send from.

Leave a Comment