sparse_categorical_crossentropy 和 categorical_crossentropy 有什么区别？

python tensorflow machine-learning keras deep-learning

sparse_categorical_crossentropy 和 categorical_crossentropy 有什么区别？什么时候应该使用一种损失而不是另一种损失？例如，这些损失是否适合线性回归？

主要区别在于您的 targets。请检查这个帖子。 jovianlin.io/cat-crossentropy-vs-sparse-cat-crossentropy

@zihaozhihao 那里的解释很好，很清楚。你能回答我在帖子中提到的其他问题吗？

对于线性回归问题，通常使用 MSE 损失。

jovianlin.io 的链接不起作用

dturvene

简单地：

categorical_crossentropy (cce) 生成一个 one-hot 数组，其中包含每个类别的可能匹配项，

sparse_categorical_crossentropy (scce) 生成最可能匹配类别的类别索引。

考虑一个有 5 个类别（或类）的分类问题。

在 cce 的情况下，one-hot 目标可能是 [0, 1, 0, 0, 0] 并且模型可以预测 [.2, .5, .1, .1, .1] （可能是正确的）

在 scce 的情况下，目标索引可能是 [1]，模型可能预测：[.5]。

现在考虑一个有 3 个类别的分类问题。

在 cce 的情况下，one-hot 目标可能是 [0, 0, 1] 并且模型可能预测 [.5, .1, .4] （可能不准确，因为它为第一类提供了更多概率）

在scce的情况下，目标索引可能是[0]，模型可能预测[.5]

许多分类模型会产生 scce 输出，因为您节省了空间，但会丢失大量信息（例如，在第二个示例中，索引 2 也非常接近。）我通常更喜欢 cce 输出以提高模型可靠性。

有多种情况可以使用 scce，包括：

当您的课程相互排斥时，即您根本不关心其他足够接近的预测，

类别数量大到预测输出变得不堪重负。

220405：对“one-hot encoding”评论的回应：

one-hot 编码用于类别特征 INPUT 以选择特定类别（例如男性与女性）。这种编码使模型可以更有效地训练：训练权重是类别的乘积，除给定类别外，所有类别都为 0。

cce 和 scce 是模型输出。 cce 是每个类别的概率数组，共 1.0。 scce 显示最可能的类别，总共 1.0。

scce 在技术上是一个单热阵列，就像用作门挡的锤子仍然是锤子，但其用途不同。 cce 不是单热的。

我认为这是不正确的： sparse_categorical_crossentropy 仍然期望对预测进行一次热编码。

这个答案是完全错误的。这是如何被赞成和接受的？模型输出仍然是一种热编码；它只是存储为索引的真实值。

我更新了答案，使其更加清晰。批评是准确的，但与 cce 和 scce 的 OUTPUT 正交。

[.5] 是什么意思？这是最令人困惑的部分。

是的，答案中假设了一些先验知识。简而言之，一个目标有一组无界的类别，每个类别都有一个归一化的概率，即它是目标。集合中的所有可能性总和必须为 1.0。因此 [.5] 意味着这个类别一般在大约一半的训练运行中训练成为目标。不是很好！然而，给定概率为 [.2 .5 .1 .1 .1] 的五个类别，那么类别 1（从 0 开始）是最好的猜测。类别值越高，匹配的可能性就越大。我不能说 0.5 是否等于 50% 的准确度。

Bitswazsky

我也对这个感到困惑。幸运的是，出色的 keras 文档提供了帮助。两者具有相同的损失函数并且最终做同样的事情，唯一的区别在于真实标签的表示。

分类交叉熵 [Doc]：

当有两个或多个标签类时使用此交叉熵损失函数。我们希望以 one_hot 表示形式提供标签。

>>> y_true = [[0, 1, 0], [0, 0, 1]]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> # Using 'auto'/'sum_over_batch_size' reduction type.  
>>> cce = tf.keras.losses.CategoricalCrossentropy()
>>> cce(y_true, y_pred).numpy()
1.177

稀疏分类交叉熵 [Doc]：

当有两个或多个标签类时使用此交叉熵损失函数。我们希望标签以整数形式提供。

>>> y_true = [1, 2]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> # Using 'auto'/'sum_over_batch_size' reduction type.  
>>> scce = tf.keras.losses.SparseCategoricalCrossentropy()
>>> scce(y_true, y_pred).numpy()
1.177

sparse-categorical-cross-entropy 的一个很好的例子是 fasion-mnist 数据集。

import tensorflow as tf
from tensorflow import keras

fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

print(y_train_full.shape) # (60000,)
print(y_train_full.dtype) # uint8

y_train_full[:10]
# array([9, 0, 0, 3, 0, 2, 7, 2, 5, 5], dtype=uint8)

这应该是公认的答案。

nbro

From the TensorFlow source code，sparse_categorical_crossentropy 定义为具有整数目标的 categorical crossentropy：

def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
  """Categorical crossentropy with integer targets.
  Arguments:
      target: An integer tensor.
      output: A tensor resulting from a softmax
          (unless `from_logits` is True, in which
          case `output` is expected to be the logits).
      from_logits: Boolean, whether `output` is the
          result of a softmax, or is a tensor of logits.
      axis: Int specifying the channels axis. `axis=-1` corresponds to data
          format `channels_last', and `axis=1` corresponds to data format
          `channels_first`.
  Returns:
      Output tensor.
  Raises:
      ValueError: if `axis` is neither -1 nor one of the axes of `output`.
  """

From the TensorFlow source code，categorical_crossentropy 定义为输出张量和目标张量之间的分类交叉熵。

def categorical_crossentropy(target, output, from_logits=False, axis=-1):
  """Categorical crossentropy between an output tensor and a target tensor.
  Arguments:
      target: A tensor of the same shape as `output`.
      output: A tensor resulting from a softmax
          (unless `from_logits` is True, in which
          case `output` is expected to be the logits).
      from_logits: Boolean, whether `output` is the
          result of a softmax, or is a tensor of logits.
      axis: Int specifying the channels axis. `axis=-1` corresponds to data
          format `channels_last', and `axis=1` corresponds to data format
          `channels_first`.
  Returns:
      Output tensor.
  Raises:
      ValueError: if `axis` is neither -1 nor one of the axes of `output`.
  """

整数目标的含义是目标标签应该是显示类索引的整数列表的形式，例如：

对于 sparse_categorical_crossentropy，对于 1 类和 2 类目标，在 5 类分类问题中，列表应该是 [1,2]。基本上，目标应该是整数形式，以便调用 sparse_categorical_crossentropy。这称为稀疏，因为目标表示需要的空间比 one-hot 编码少得多。例如，具有 b 个目标和 k 个类别的批次需要 b * k 空间才能用 one-hot 表示，而具有 b 个目标和 k 类的批次需要 b 空间才能以整数形式表示。

对于 categorical_crossentropy，对于 1 类和 2 类目标，在 5 类分类问题中，列表应为 [[0,1,0,0,0], [0,0,1,0,0]]。基本上，目标应该是 one-hot 形式，以便调用 categorical_crossentropy。

目标的表示是唯一的区别，结果应该是相同的，因为它们都在计算分类交叉熵。

当我使用带有整数标签的 categorical_crossentropy 时，Tensorflow 没有抱怨。我会预料到一个错误。你知道为什么会这样吗？

sparse_categorical_crossentropy 和 categorical_crossentropy 有什么区别？

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们