Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. This question does not appear to be about programming within the scope defined in the help center. Closed last year. Improve this question
I know that, in the 1D case, the convolution between two vectors, a
and b
, can be computed as conv(a, b)
, but also as the product between the T_a
and b
, where T_a
is the corresponding Toeplitz matrix for a
.
Is it possible to extend this idea to 2D?
Given a = [5 1 3; 1 1 2; 2 1 3]
and b=[4 3; 1 2]
, is it possible to convert a
in a Toeplitz matrix and compute the matrix-matrix product between T_a
and b
as in the 1-D case?
deep-learning
tag info.
Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x
and 2d kernel k
and you want to calculate the convolution x * k
. Also let's assume that k
is already flipped. Let's also assume that x
is of size n×n
and k
is m×m
.
So you unroll k
into a sparse matrix of size (n-m+1)^2 × n^2
, and unroll x
into a long vector n^2 × 1
. You compute a multiplication of this sparse matrix with a vector and convert the resulting vector (which will have a size (n-m+1)^2 × 1
) into a n-m+1
square matrix.
I am pretty sure this is hard to understand just from reading. So here is an example for 2×2 kernel and 3×3 input.
https://i.stack.imgur.com/FlUZU.png
Here is a constructed matrix with a vector:
https://i.stack.imgur.com/yLd8G.png
https://i.stack.imgur.com/oMhJs.png
And this is the same result you would have got by doing a sliding window of k
over x
.
1- Define Input and Filter
Let I be the input signal and F be the filter or kernel.
https://i.stack.imgur.com/zjGRw.png
2- Calculate the final output size
If the I is m1 x n1 and F is m2 x n2 the size of the output will be:
https://i.stack.imgur.com/yLSFB.png
3- Zero-pad the filter matrix
Zero pad the filter to make it the same size as the output.
https://i.stack.imgur.com/XUTfK.png
4- Create Toeplitz matrix for each row of the zero-padded filter
https://i.stack.imgur.com/vblU4.png
5- Create a doubly blocked Toeplitz matrix
https://i.stack.imgur.com/DaAPx.png
https://i.stack.imgur.com/gk5MI.png
6- Convert the input matrix to a column vector
https://i.stack.imgur.com/QI0O6.png
7- Multiply doubly blocked toeplitz matrix with vectorized input signal
This multiplication gives the convolution result.
8- Last step: reshape the result to a matrix form
https://i.stack.imgur.com/a5yWA.png
For more details and python code take a look at my github repository:
If you unravel k to a m^2 vector and unroll X, you would then get:
a m**2 vectork
a ((n-m)**2, m**2) matrix for unrolled_X
where unrolled_X
could be obtained by the following Python code:
from numpy import zeros
def unroll_matrix(X, m):
flat_X = X.flatten()
n = X.shape[0]
unrolled_X = zeros(((n - m) ** 2, m**2))
skipped = 0
for i in range(n ** 2):
if (i % n) < n - m and ((i / n) % n) < n - m:
for j in range(m):
for l in range(m):
unrolled_X[i - skipped, j * m + l] = flat_X[i + j * n + l]
else:
skipped += 1
return unrolled_X
Unrolling X and not k allows a more compact representation (smaller matrices) than the other way around for each X - but you need to unroll each X. You could prefer unrolling k depending on what you want to do.
Here, the unrolled_X
is not sparse, whereas unrolled_k
would be sparse, but of size ((n-m+1)^2,n^2)
as @Salvador Dali mentioned.
Unrolling k
could be done like this:
from scipy.sparse import lil_matrix
from numpy import zeros
import scipy
def unroll_kernel(kernel, n, sparse=True):
m = kernel.shape[0]
if sparse:
unrolled_K = lil_matrix(((n - m)**2, n**2))
else:
unrolled_K = zeros(((n - m)**2, n**2))
skipped = 0
for i in range(n ** 2):
if (i % n) < n - m and((i / n) % n) < n - m:
for j in range(m):
for l in range(m):
unrolled_K[i - skipped, i + j * n + l] = kernel[j, l]
else:
skipped += 1
return unrolled_K
The code shown above doesn't produce the unrolled matrix of the right dimensions. The dimension should be (n-k+1)*(m-k+1), (k)(k). k: filter dimension, n: num rows in input matrix, m: num columns.
def unfold_matrix(X, k):
n, m = X.shape[0:2]
xx = zeros(((n - k + 1) * (m - k + 1), k**2))
row_num = 0
def make_row(x):
return x.flatten()
for i in range(n- k+ 1):
for j in range(m - k + 1):
#collect block of m*m elements and convert to row
xx[row_num,:] = make_row(X[i:i+k, j:j+k])
row_num = row_num + 1
return xx
For more details, see my blog post:
Success story sharing
Also let's assume that k is already flipped
? Is it because we want to perform correlation instead of convolution? What isflipped
in terms of numpy operations?flip(m, 0)
, which is is equivalent toflipud(m)
.