对卷积神经网络中的 1D、2D 和 3D 卷积的直观理解 [关闭]

r

rayryeng

我想用C3D的图片来解释。

简而言之，卷积方向和输出形状很重要！

https://i.stack.imgur.com/owWjX.png

↑↑↑↑↑ 一维卷积 - 基本 ↑↑↑↑↑

只需 1 个方向（时间轴）即可计算转换

输入 = [W]，过滤器 = [k]，输出 = [W]

例如）输入 = [1,1,1,1,1]，过滤器 = [0.25,0.5,0.25]，输出 = [1,1,1,1,1]

输出形状是一维数组

示例）图形平滑

tf.nn.conv1d 代码玩具示例

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

https://i.stack.imgur.com/hvMaU.png

↑↑↑↑↑ 2D 卷积 - 基本 ↑↑↑↑↑

2方向（x，y）计算conv

输出形状是二维矩阵

输入 = [W, H]，过滤器 = [k,k] 输出 = [W,H]

示例）Sobel Egde 过滤器

tf.nn.conv2d - 玩具示例

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

https://i.stack.imgur.com/IvDQP.png

↑↑↑↑↑ 3D 卷积 - 基本 ↑↑↑↑↑

3 方向 (x,y,z) 计算转换

输出形状是 3D 体积

输入 = [W,H,L]，过滤器 = [k,k,d] 输出 = [W,H,M]

d < L 很重要！用于制作音量输出

示例）C3D

tf.nn.conv3d - 玩具示例

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

https://i.stack.imgur.com/49cdt.png

↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ..., ↑↑↑↑↑

即使输入是 3D ex) 224x224x3, 112x112x32

output-shape 不是 3D Volume，而是 2D Matrix

因为过滤器深度 = L 必须与输入通道 = L 匹配

方向 (x,y) 来计算 conv!不是 3D

输入 = [W,H,L]，过滤器 = [k,k,L] 输出 = [W,H]

输出形状是二维矩阵

如果我们想训练 N 个过滤器怎么办（N 是过滤器的数量）

那么输出形状是（堆叠的 2D）3D = 2D x N 矩阵。

conv2d - LeNet, VGG, ... 1 个过滤器

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

conv2d - LeNet, VGG, ... 用于 N 个过滤器

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

https://i.stack.imgur.com/RghcS.png

当您认为这是像 sobel 这样的 2D 图像过滤器时，1x1 conv 令人困惑

对于 CNN 中的 1x1 conv，输入是如上图所示的 3D 形状。

它计算深度过滤

输入 = [W,H,L]，过滤器 = [1,1,L] 输出 = [W,H]

输出堆叠形状为 3D = 2D x N 矩阵。

tf.nn.conv2d - 特殊情况 1x1 转换

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

动画（带有 3D 输入的 2D 转换）

https://i.stack.imgur.com/FjvuN.gif

原文链接：LINK

作者：马丁·戈尔纳

推特：@martin_gorner

谷歌+：plus.google.com/+MartinGorne

带有 2D 输入的额外 1D 卷积

https://i.stack.imgur.com/woaXM.jpg

https://i.stack.imgur.com/9VBtu.jpg

即使输入是 2D ex) 20x14

output-shape 不是 2D ，而是 1D Matrix

因为过滤器高度 = L 必须与输入高度 = L 匹配

方向 (x) 来计算 conv!不是二维的

输入 = [W,L]，过滤器 = [k,L] 输出 = [W]

输出形状是一维矩阵

如果我们想训练 N 个过滤器怎么办（N 是过滤器的数量）

那么输出形状是（堆叠的 1D）2D = 1D x N 矩阵。

奖金 C3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])

filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

Tensorflow 中的输入和输出

https://i.stack.imgur.com/I25ty.png

https://i.stack.imgur.com/xIdEq.png

概括

https://i.stack.imgur.com/HCWgp.png

考虑到您的工作量和解释的清晰性，8 的赞成票太少了。

带有 3d 输入的 2d conv 是一个不错的选择。我建议进行编辑以包含 1d conv 和 2d 输入（例如多通道阵列），并将其差异与 2d conv 和 2d 输入进行比较。

惊人的答案！

为什么 2d 中的 conv 方向是↲。我曾看到消息来源声称第 1 行的方向是 →，然后是第 1+stride 行的方向是 →。卷积本身是移位不变的，那么为什么卷积的方向很重要呢？

谢谢你的问题。是的！卷积本身是移位不变的。所以计算conv方向无关紧要。（你可以用两个大矩阵乘法计算2d conv。caffe框架已经做过）但是为了理解最好用conv方向来解释。因为带有 3d 输入的 2d conv 在没有方向的情况下令人困惑。 ^^

t

thushv89

在@runhani 的回答之后，我添加了更多细节以使解释更加清晰，并将尝试更多地解释这一点（当然还有来自 TF1 和 TF2 的示例）。

我包括的主要附加位之一是，

重视应用

tf.Variable 的用法

更清晰的输入/内核/输出解释 1D/2D/3D 卷积

步幅/填充的效果

一维卷积

以下是使用 TF 1 和 TF 2 进行一维卷积的方法。

具体来说，我的数据具有以下形状，

一维向量 - [批量大小、宽度、通道数]（例如 1、5、1）

内核 - [宽度，输入通道，输出通道]（例如 5、1、4）

输出 - [批量大小、宽度、out_channels]（例如 1、5、4）

TF1 示例

import tensorflow as tf
import numpy as np

inp = tf.placeholder(shape=[None, 5, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  print(sess.run(out, feed_dict={inp: np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]])}))

TF2 示例

import tensorflow as tf
import numpy as np

inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
print(out)

TF2 的工作量要少得多，因为 TF2 不需要 Session 和 variable_initializer 例如。

这在现实生活中会是什么样子？

因此，让我们使用信号平滑示例来了解这是做什么的。左边是原始的，右边是 Convolution 1D 的输出，它有 3 个输出通道。

https://i.stack.imgur.com/w23RC.png

多渠道是什么意思？

多通道基本上是输入的多个特征表示。在此示例中，您有由三个不同过滤器获得的三个表示。第一个通道是等权平滑滤波器。第二个是过滤器，它对过滤器中间的权重大于边界。最后一个过滤器的作用与第二个相反。所以你可以看到这些不同的滤镜是如何带来不同的效果的。

一维卷积的深度学习应用

一维卷积已成功用于 sentence classification 任务。

二维卷积

转为二维卷积。如果你是一个深度学习的人，那么你没有遇到 2D 卷积的机会是……几乎为零。它在 CNN 中用于图像分类、对象检测等以及涉及图像的 NLP 问题（例如图像标题生成）。

让我们尝试一个例子，我在这里得到了一个带有以下过滤器的卷积核，

边缘检测内核（3x3 窗口）

模糊内核（3x3 窗口）

锐化内核（3x3 窗口）

具体来说，我的数据具有以下形状，

图片（黑白） - [batch_size, height, width, 1]（例如 1, 340, 371, 1）

内核（又名过滤器） - [高度、宽度、输入通道、输出通道]（例如 3、3、1、3）

输出（又名特征图） - [batch_size, height, width, out_channels]（例如 1、340、371、3）

TF1 示例，

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

inp = tf.placeholder(shape=[None, image_height, image_width, 1], dtype=tf.float32)
kernel = tf.Variable(kernel_init, dtype=tf.float32)
out = tf.nn.conv2d(inp, kernel, strides=[1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.expand_dims(np.expand_dims(im,0),-1)})

TF2 示例

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

kernel = tf.Variable(kernel_init, dtype=tf.float32)

out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')

这在现实生活中会是什么样子？

https://i.stack.imgur.com/NuldH.png

多渠道是什么意思？

在 2D 卷积的上下文中，更容易理解这些多个通道的含义。假设您正在进行人脸识别。您可以想到（这是一个非常不切实际的简化，但可以理解）每个过滤器代表眼睛、嘴巴、鼻子等。因此每个特征图都是您提供的图像中是否存在该特征的二进制表示.我认为我不需要强调对于人脸识别模型来说，这些都是非常有价值的特征。此article中的更多信息。

这是我试图表达的一个例证。

https://i.stack.imgur.com/9bi5k.gif

2D卷积的深度学习应用

2D 卷积在深度学习领域非常普遍。

CNN（卷积神经网络）对几乎所有计算机视觉任务（例如图像分类、对象检测、视频分类）使用 2D 卷积操作。

3D卷积

现在越来越难以说明随着维度数量的增加会发生什么。但是，如果很好地理解了 1D 和 2D 卷积的工作原理，那么将这种理解推广到 3D 卷积是非常简单的。所以这里。

具体来说，我的数据具有以下形状，

3D 数据 (LIDAR) - [批量大小、高度、宽度、深度、通道中]（例如 1、200、200、200、1）

内核 - [高度、宽度、深度、输入通道、输出通道]（例如 5、5、5、1、3）

输出 - [批量大小、宽度、高度、宽度、深度、out_channels]（例如 1、200、200、2000、3）

TF1 示例

import tensorflow as tf
import numpy as np

tf.reset_default_graph()

inp = tf.placeholder(shape=[None, 200, 200, 200, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(inp, kernel, strides=[1,1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.random.normal(size=(1,200,200,200,1))})

TF2 示例

import tensorflow as tf
import numpy as np

x = np.random.normal(size=(1,200,200,200,1))
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(x, kernel, strides=[1,1,1,1,1], padding='SAME')

3D卷积的深度学习应用

在开发涉及本质上是 3 维的 LIDAR（光检测和测距）数据的机器学习应用程序时，已经使用了 3D 卷积。

什么...更多行话？：步幅和填充

好吧，你快到了。所以坚持住。让我们看看 stride 和 padding 是什么。如果您考虑它们，它们非常直观。

如果你大步穿过走廊，你可以用更少的步骤更快地到达那里。但这也意味着您观察到的周围环境比穿过房间时要少。现在让我们用一张漂亮的图片来加强我们的理解！让我们通过 2D 卷积来理解这些。

了解步幅

https://i.stack.imgur.com/XD2O4.png

例如，当您使用 tf.nn.conv2d 时，您需要将其设置为 4 个元素的向量。没有理由对此感到害怕。它只包含按以下顺序的步幅。

2D Convolution - [batch stride, height stride, width stride, channel stride]。在这里，您只需将批处理步幅和通道步幅设置为 1（我已经实施深度学习模型 5 年了，除了 1 之外，从未将它们设置为任何值）。因此，您只需设置 2 步即可。

3D Convolution - [batch stride, height stride, width stride, depth stride, channel stride]。在这里，您只担心高度/宽度/深度步幅。

了解填充

现在，您注意到，无论您的步幅有多小（即 1），在卷积期间都会发生不可避免的降维（例如，在卷积 4 个单位宽的图像后宽度为 3）。这是不可取的，尤其是在构建深度卷积神经网络时。这就是填充来拯救的地方。有两种最常用的填充类型。

相同且有效

您可以在下面看到差异。

https://i.stack.imgur.com/O01D7.png

最后一句话：如果你很好奇，你可能会想知道。我们刚刚对全自动降维投下了一颗炸弹，现在谈论的是不同的步幅。但是 stride 的最大优点是您可以控制何时何地以及如何缩小尺寸。

z

zz x

总之，在一维 CNN 中，内核向 1 个方向移动。 1D CNN 的输入和输出数据是二维的。主要用于时间序列数据。

在 2D CNN 中，内核向 2 个方向移动。 2D CNN 的输入和输出数据是 3 维的。主要用于图像数据。

在 3D CNN 中，内核沿 3 个方向移动。 3D CNN 的输入和输出数据是 4 维的。主要用于 3D 图像数据（MRI、CT 扫描）。

您可以在此处找到更多详细信息：https://medium.com/@xzz201920/conv1d-conv2d-and-conv3d-8a59182c4d6

也许很重要的一点是，在 CNN 架构中，中间层通常会有 2D 输出，即使一开始输入只是 1D。

J

Jon

CNN 1D、2D 或 3D 指的是卷积方向，而不是输入或过滤器维度。对于 1 通道输入，CNN2D 等于 CNN1D 是内核长度 = 输入长度。（1 个转化方向）

对卷积神经网络中的 1D、2D 和 3D 卷积的直观理解 [关闭]

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们