ChatGPT解决这个技术问题 Extra ChatGPT

2-D convolution as a matrix-matrix multiplication [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. This question does not appear to be about programming within the scope defined in the help center. Closed last year. Improve this question

I know that, in the 1D case, the convolution between two vectors, a and b, can be computed as conv(a, b), but also as the product between the T_a and b, where T_a is the corresponding Toeplitz matrix for a.

Is it possible to extend this idea to 2D?

Given a = [5 1 3; 1 1 2; 2 1 3] and b=[4 3; 1 2], is it possible to convert a in a Toeplitz matrix and compute the matrix-matrix product between T_a and b as in the 1-D case?

I’m voting to close this question because it is not about programming as defined in the help center but about ML theory and/or methodology - please see the intro and NOTE in the deep-learning tag info.

p
primfaktor

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x and 2d kernel k and you want to calculate the convolution x * k. Also let's assume that k is already flipped. Let's also assume that x is of size n×n and k is m×m.

So you unroll k into a sparse matrix of size (n-m+1)^2 × n^2, and unroll x into a long vector n^2 × 1. You compute a multiplication of this sparse matrix with a vector and convert the resulting vector (which will have a size (n-m+1)^2 × 1) into a n-m+1 square matrix.

I am pretty sure this is hard to understand just from reading. So here is an example for 2×2 kernel and 3×3 input.

https://i.stack.imgur.com/FlUZU.png

Here is a constructed matrix with a vector:

https://i.stack.imgur.com/yLd8G.png

https://i.stack.imgur.com/oMhJs.png

And this is the same result you would have got by doing a sliding window of k over x.


There would have to be some sort of reshaping at the end correct? That last vector is 4 x 1 but the result of the convolution would be 2 x 2
@jvans yes, in the end you should reshape your vector. It is written here: convert the resulting vector (which will have a size (n-m+1)^2 X 1) into a n-m+1 square matrix
In your example this is not a Toeplitz matrix. So you answer is only partially correct, is it ?
What you mean by Also let's assume that k is already flipped? Is it because we want to perform correlation instead of convolution? What is flipped in terms of numpy operations?
@mrgloom Yes, the operation above is a correlation which is why he flips the filter vertically (upside down) first so it then becomes equivalent to a convolution. The numpy is flip(m, 0), which is is equivalent to flipud(m).
A
Ali Salehi

1- Define Input and Filter

Let I be the input signal and F be the filter or kernel.

https://i.stack.imgur.com/zjGRw.png

2- Calculate the final output size

If the I is m1 x n1 and F is m2 x n2 the size of the output will be:

https://i.stack.imgur.com/yLSFB.png

3- Zero-pad the filter matrix

Zero pad the filter to make it the same size as the output.

https://i.stack.imgur.com/XUTfK.png

4- Create Toeplitz matrix for each row of the zero-padded filter

https://i.stack.imgur.com/vblU4.png

5- Create a doubly blocked Toeplitz matrix

https://i.stack.imgur.com/DaAPx.png

https://i.stack.imgur.com/gk5MI.png

6- Convert the input matrix to a column vector

https://i.stack.imgur.com/QI0O6.png

7- Multiply doubly blocked toeplitz matrix with vectorized input signal

This multiplication gives the convolution result.

8- Last step: reshape the result to a matrix form

https://i.stack.imgur.com/a5yWA.png

For more details and python code take a look at my github repository:

Step by step explanation of 2D convolution implemented as matrix multiplication using toeplitz matrices in python


I think there is an error. The first element of the result should be 10*0 + 20*0 + 30*0 +40*1 = 40. The element in position 2,2 should be 1*10 + 2*20 + 4*30 + 5*40 = 370. I think your result is correct for a matrix F equal to [40 30; 20 10] that is exactly F flipping both rows and columns. There is therefore an error in the procedure
It is doing convolution (mathematical convolution, not cross-correlation), so if you are doing it by hand, you need to flip the filter both vertically and horizontally. You can find more information on my GitHub repo.
This is a great explanation of 2D convolution as a matrix operation. Is there a way to represent "mode='same'" also? (i.e. keeping the output shape the same as the image)?
@ajl123 I think it should be. I'll work on it if I get time. Please feel free to dig into the code and math and send me a pull request on Github if you get the answer.
shouldn't the dimension of the resulting matrix decrease?
G
Geoffrey Negiar

If you unravel k to a m^2 vector and unroll X, you would then get:

a m**2 vectork

a ((n-m)**2, m**2) matrix for unrolled_X

where unrolled_X could be obtained by the following Python code:

from numpy import zeros


def unroll_matrix(X, m):
  flat_X = X.flatten()
  n = X.shape[0]
  unrolled_X = zeros(((n - m) ** 2, m**2))
  skipped = 0
  for i in range(n ** 2):
      if (i % n) < n - m and ((i / n) % n) < n - m:
          for j in range(m):
              for l in range(m):
                  unrolled_X[i - skipped, j * m + l] = flat_X[i + j * n + l]
      else:
          skipped += 1
  return unrolled_X

Unrolling X and not k allows a more compact representation (smaller matrices) than the other way around for each X - but you need to unroll each X. You could prefer unrolling k depending on what you want to do.

Here, the unrolled_X is not sparse, whereas unrolled_k would be sparse, but of size ((n-m+1)^2,n^2) as @Salvador Dali mentioned.

Unrolling k could be done like this:

from scipy.sparse import lil_matrix
from numpy import zeros
import scipy 


def unroll_kernel(kernel, n, sparse=True):

    m = kernel.shape[0]
    if sparse:
         unrolled_K = lil_matrix(((n - m)**2, n**2))
    else:
         unrolled_K = zeros(((n - m)**2, n**2))

    skipped = 0
    for i in range(n ** 2):
         if (i % n) < n - m and((i / n) % n) < n - m:
             for j in range(m):
                 for l in range(m):
                    unrolled_K[i - skipped, i + j * n + l] = kernel[j, l]
         else:
             skipped += 1
    return unrolled_K

u
user8697906

The code shown above doesn't produce the unrolled matrix of the right dimensions. The dimension should be (n-k+1)*(m-k+1), (k)(k). k: filter dimension, n: num rows in input matrix, m: num columns.

def unfold_matrix(X, k):
    n, m = X.shape[0:2]
    xx = zeros(((n - k + 1) * (m - k + 1), k**2))
    row_num = 0
    def make_row(x):
        return x.flatten()

    for i in range(n- k+ 1):
        for j in range(m - k + 1):
            #collect block of m*m elements and convert to row
            xx[row_num,:] = make_row(X[i:i+k, j:j+k])
            row_num = row_num + 1

    return xx

For more details, see my blog post:

http://www.telesens.co/2018/04/09/initializing-weights-for-the-convolutional-and-fully-connected-layers/


关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now