Overlapping sliding window over an image using blockproc or im2col?

OK, this is quite a complex question. I’ll try and break this up into separate parts and will answer each question separately.

Question #1

blockproc can be used to implement my function on a sliding window using the BorderSize and TrimBorder arguments.

B = blockproc(A,[64,64],fun,'BorderSize',[5,5], 'TrimBorder', 'false');

I realize that this creates a block of [64 + 2*5, 64 + 2*5] and applies the function @fun on each block. But since I cannot go into my function @fun in debugging to verify proper operation I cannot be sure this is what I need. Is my above code correct for what I need? I know that I get a concatenated result in B but it should be on a overlapping sliding block. Will this achieve what I need?

After experimenting around with blockproc, this is indeed correct where you can use it to get sliding neighbourhood processing working. However, you’re going to need an additional flag, which is PadPartialBlocks. The purpose of this flag is so that if you are extracting a block where you’re at the outer edges of the image and you can’t make a block of a specified size, this will zero-pad this partial block so that it conforms to the same size. Here’s a small example to get this working with sliding windows. Supposing we had a matrix such that:

>> A = reshape(1:25,5,5)

A =

     1     6    11    16    21
     2     7    12    17    22
     3     8    13    18    23
     4     9    14    19    24
     5    10    15    20    25

Let’s say we wanted to take the average of each 3 x 3 overlapping neighbourhood in the matrix above, and zero-padding those elements that go beyond the borders of the matrix. You would do this with blockproc:

B = blockproc(A, [1 1], @(x) mean(x.data(:)), 'BorderSize', [1 1], 'TrimBorder', false, 'PadPartialBlocks', true);

What’s important to note is that the block size, which is 1 x 1 in this case and BorderSize which is 1 x 1 as well are set differently than what you’d expect for a 3 x 3 block. To go into why this is the case, we need some further insight on how BorderSize works. For a given centre of a block, BorderSize allows you to capture values / pixels beyond the dimensions of the originally sized block. For those locations that go beyond the borders of the matrix, we would pad these locations as zero by default. BorderSize allows us to capture 2M + 2N pixels more, where M and N are the horizontal and vertical border size you want. This would allow us to capture M more pixels both above and below the original block and N more pixels to the left and right of the original block.

Therefore, for the value of 1 in A, if the block size is 1 x 1, this means that the element consists of only 1, and if our BorderSize was 1 x 1. This means our final block would be:

0  0  0
0  1  6
0  2  7

Because our block size is 1, the next block would be centred at 6, and we would get a 3 x 3 grid of pixels and so on. It is also important that TrimBorder is set to false so that we can keep those pixels that were originally captured upon expansion of the block. The default is set to true. Finally, PadPartialBlocks is true to ensure that all blocks are the same size. When you run the above code, the result we get is:

B =

    1.7778    4.3333    7.6667   11.0000    8.4444
    3.0000    7.0000   12.0000   17.0000   13.0000
    3.6667    8.0000   13.0000   18.0000   13.6667
    4.3333    9.0000   14.0000   19.0000   14.3333
    3.1111    6.3333    9.6667   13.0000    9.7778

You can verify that we get the same result using nlfilter where we can apply the mean to 3 x 3 sliding neighbourhoods:

C = nlfilter(A, [3 3], @(x) mean(x(:)))

C =

    1.7778    4.3333    7.6667   11.0000    8.4444
    3.0000    7.0000   12.0000   17.0000   13.0000
    3.6667    8.0000   13.0000   18.0000   13.6667
    4.3333    9.0000   14.0000   19.0000   14.3333
    3.1111    6.3333    9.6667   13.0000    9.7778

As such, if you want to properly use blockproc for sliding operations, you need to be careful on how you set the block size and border size respectively. In this case, the general rule is to always set your block size to be 1 x 1, and allow BorderSize to specify the size of each block you want. Specifically, for a block of size K x K, you would set BorderSize to be floor(K/2) x floor(K/2) respectively. It would make things easy if K was odd.

For example, if you wanted a 5 x 5 mean filtering operation on a sliding window basis, you would set BorderSize to [2 2], as K = 5 and floor(K/2) = 2. Therefore, you would do this:

B = blockproc(A, [1 1], @(x) mean(x.data(:)), 'BorderSize', [2 2], 'TrimBorder', false, 'PadPartialBlocks', true)

B =

    2.5200    4.5600    7.2000    6.9600    6.1200
    3.6000    6.4000   10.0000    9.6000    8.4000
    4.8000    8.4000   13.0000   12.4000   10.8000
    4.0800    7.0400   10.8000   10.2400    8.8800
    3.2400    5.5200    8.4000    7.9200    6.8400

Replicating this with nlfilter with a size of 5 x 5 also gives:

C = nlfilter(A, [5 5], @(x) mean(x(:)))

C =

    2.5200    4.5600    7.2000    6.9600    6.1200
    3.6000    6.4000   10.0000    9.6000    8.4000
    4.8000    8.4000   13.0000   12.4000   10.8000
    4.0800    7.0400   10.8000   10.2400    8.8800
    3.2400    5.5200    8.4000    7.9200    6.8400

I was doing some timing tests, and it seems that blockproc used in this context is faster than nlfilter.

Question #2

The second is im2col. im2col(A,[m n],block_type) will divide the block into m by n blocks and arrange them in columns, so each column is a block? If so, how is the overlapping controlled? And if each block is a column can I successfully apply the dct2 function on each column? Because I doubt it will take vectors as input?

You are correct in that im2col transforms each pixel neighbourhood or block into a single column and the concatenation of these columns forms the output matrix. You can control whether the blocks are overlapping or are distinct by the block_type parameter. Specify distinct or sliding (which is default) to control this. You can also control the size of each neighbourhood with m and n.

However, if it is your goal to apply dct2 with the output of im2col, then you will not get what you desire. Specifically, dct2 takes the spatial location of each data point within your 2D data into account and is used as part of the transform. By transforming each pixel neighbourhood into a single column, the 2D spatial relationships that were originally there for each block are now gone. dct2 expects 2D spatial data, but you would be specifying 1D data instead. As such, im2col is probably not what you’re looking for. If I understand what you want correctly, you’ll want to use blockproc instead.


Hope this helps!

Leave a Comment