numpy－理解 keepdims=True

理解 numpy 中的 keepdims。实现 softmax 时遇到的坑。

官方文档

文档对 numpy.sum 里 keepdims 的说明如下：

numpy.sum(a, axis=None, dtype=None, out=None, keepdims=False)[source]
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

边做实验边来解释。

np.max(x)

>>> x=np.array([[1001, 1002], [3, 4]])
>>> x -= np.max(x)
>>> x
array([[  -1,    0],
       [-999, -998]])

np.max(x) 的结果一个数 1002, 矩阵的最大值。
因此 x -= np.max(x) 的效果是减去矩阵 x 中所有元素最大值

np.max(x, axis=1)

>>> x=np.array([[1001, 1002], [3, 4]])
>>> x -= np.max(x, axis=1)
>>> x
array([[  -1,  998],
       [-999,    0]])

axis=1 代表以行为单位，因此 np.max(x, axis＝1) 求的是每行的最大值，然而注意它的结果默认是一个行向量，[1002, 4]
如果是 axis=0，代表以列为单位，求每一列最大值，结果是 [1001, 1002]
x -= np.max(x, axis=1) 在这里完全不 make sense，如果硬要解释的话，就是第 i 列减去第 i 行的最大值，当然这就要求矩阵必须是方阵。（没有 broadcast）
所以我们要做的实际就是把这个行向量转化为列向量，然后 broadcast correctly against the input array，这也就是 keepdims 的功能。

np.max(x, axis=1, keepdims=True) 减去每一行最大值的正确打开方式

>>> import numpy as np
>>> x=np.array([[1001, 1002], [3, 4]])
>>> x -= np.max(x, axis=1, keepdims=True)
>>> x
array([[-1,  0],
       [-1,  0]])