第十八节，TensorFlow中使用批量归一化

时间：2018-05-06 00:42:03 阅读：645 评论：0 收藏：0 [点我收藏+]

在深度学习章节里，已经介绍了批量归一化的概念，详情请点击这里：第九节，改善深层神经网络：超参数调试、正则化以优化(下)

由于在深层网络中，不同层的分布都不一样，会导致训练时出现饱和的问题。而批量归一化就是为了缓解这个问题提出的。而且在实际应用中，批量归一化的收敛非常快，并且具有很强的泛化能力，某种情况下完全可以替代正则化和弃权。

一批量归一化函数

归一化算法可以描述为：

技术分享图片

1.TensorFlow中自带BN函数的定义：

def batch_normalization(x,
                        mean,
                        variance,
                        offset,
                        scale,
                        variance_epsilon,
                        name=None):

各个参数的说明如下：

x：代表任意维度的输入张量。
mean:代表样本的均值。
variance：代表样本的方差。
offset：代表偏移，即相加一个转化值，也是公式中的beta。
scale：代表缩放，即乘以一个转化值，也是公式中的gamma。
variance_epsilon：是为了避免分母为0的情况下，给分母加上的一个极小值，默认即可。
name：名称。

要想使用这个整数，必须由另一个函数配合使用，tf.nn.moments，由它来计算均值和方差，然后就可以使用BN了。

2.tf.nn.moment()函数的定义如下：

def moments(x, axes, shift=None, name=None, keep_dims=False):

x：输入张量。
axes：指定沿哪个轴计算平均值和方差。
shift：A `Tensor` containing the value by which to shift the data for numerical stability, or `None` in which case the true mean of the data is used as shift. A shift close to the true mean provides the most numerically stable results.
name：名称。
keep_dims：是否保留维度，即形状是否和输入一样。

有了以上两个函数还不够，为了有更好的效果，我们希望使用指数加权平均的方法来优化每次的均值和方差，于是就用到了tf.train.ExponentialMovingAverage()类，它的作用是让上一次的值对本次的值有个衰减后的影响，从而使每次的值连起来后会相对平滑一下：详细内容可以点击这里：第八节，改善深层神经网络：超参数调试、正则化以优化(中)

我们可以用一个表达式来表示这个函数的功能：

shadow_variable = decay * shadow_variable + (1 - decay) *variable

各参数说明如下：

decay：代表衰减指数，是在ExponentialMovingAverage()中指定的，一般为0.9.
variable：代表本批次样本中的值。
等式右边的shadow_variable：代表上次总样本的值。
等式左边的shadow_variable：代表本次次总样本的值。

对于shadow_variable的理解，你可以将其人为该数值表示的是1/(1-β)次的平均值，本次样本所占的权重为(1-decay)，上次样本所占权重为(1-decay)decay，上上次样本所占权重为(1-decay)decay^2，以此类推....

3.tf.train.ExponentialMovingAverage类的定义如下：

  def __init__(self, decay, num_updates=None, zero_debias=False,
               name="ExponentialMovingAverage"):
　def apply(self, var_list=None):

参数说明如下：

decay: Float. The decay to use.
num_updates: Optional count of number of updates applied to variables. actual decay rate used is: `min(decay, (1 + num_updates) / (10 + num_updates))
zero_debias: If `True`, zero debias moving-averages that are initialized with tensors.
name: String. Optional prefix name to use for the name of ops added in.
var_list: A list of Variable or Tensor objects. The variables and Tensors must be of types float16, float32, or float64.apply

通过调用apply()函数可以更新指数加权平均值。

二批量归一化的简单用法

上面的函数虽然参数不多，但是需要几个函数联合起来使用，于是TensorFlow中的layers模块里又实现了一次BN函数，相当于把几个函数合并到了一起，使用起来更加简单。下面来介绍一下，使用时需要引入：

from tensorflow.contrib.layers.python.layers import batch_norm

或者直接调用tf.contrib.layers.batch_norm()，该函数的定义如下：

def batch_norm(inputs,
               decay=0.999,
               center=True,
               scale=False,
               epsilon=0.001,
               activation_fn=None,
               param_initializers=None,
               param_regularizers=None,
               updates_collections=ops.GraphKeys.UPDATE_OPS,
               is_training=True,
               reuse=None,
               variables_collections=None,
               outputs_collections=None,
               trainable=True,
               batch_weights=None,
               fused=False,
               data_format=DATA_FORMAT_NHWC,
               zero_debias_moving_mean=False,
               scope=None,
               renorm=False,
               renorm_clipping=None,
               renorm_decay=0.99):

参数说明如下：

inputs: A tensor with 2 or more dimensions, where the first dimension has `batch_size`. The normalization is over all but the last dimension if `data_format` is `NHWC` and the second dimension if `data_format` is `NCHW`.代表输入，第一个维度为batch_size
dacay:Decay for the moving average. Reasonable values for `decay` are close to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc. Lower `decay` value (recommend trying `decay`=0.9) if model experiences reasonably good training performance but poor validation and/or test performance. Try zero_debias_moving_mean=True for improved stability.代表加权指数平均值的衰减速度，是使用了一种叫做加权指数衰减的方法更新均值和方差。一般会设置为0.9，值太小会导致均值和方差更新太快，而值太大又会导致几乎没有衰减，容易出现过拟合，这种情况一般需要把值调小点。
center: If True, add offset of `beta` to normalized tensor. If False, `beta` is ignored. 指定是否使用偏移beta。
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the next layer is linear (also e.g. `nn.relu`), this can be disabled since the scaling can be done by the next layer.是否进行变换(通过乘以一个gamma进行缩放)，我们习惯在BN后面接一个线性变化，如Relu，所以scale一般都设置为Flase，因为后面有对数据的转换处理，所以这里就不用再处理了。
epsilon: Small float added to variance to avoid dividing by zero.是为了避免分母为0的情况下，给分母加上的一个极小值，默认即可。
activation_fn: Activation function, default set to None to skip it and maintain a linear activation.激活函数，默认为None，即使用线性激活函数。
param_initializers: Optional initializers for beta, gamma, moving mean and moving variance.可选的初始化参数。
param_regularizers: Optional regularizer for beta and gamma.可选的正则化项。
updates_collections: Collections to collect the update ops for computation. The updates_ops need to be executed with the train_op. If None, a control dependency would be added to make sure the updates are computed in place.其变量默认是tf.GraphKeys.UPDATE_OPS，在训练时提供了一种内置的均值和方差更新机制，即通过图中的tf.Graphs.UPDATE_OPS变量来更新，但它是在每次当前批次训练完成后才更新均值和方差，这样就导致当前数据总是使用前一次的均值和方差，没有得到最新的更新。所以一般都会将其设置为None，让均值和方差即时更新。这样虽然相比默认值在性能上稍慢点，但是对模型的训练还是有很大帮助的。
is_training: Whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into `moving_mean` and `moving_variance` using an exponential moving average with the given `decay`. When it is not in training mode then it would use the values of the `moving_mean` and the `moving_variance`.
reuse: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.支持共享变量，与下面的scope参数联合使用。
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
batch_weights: An optional tensor of shape `[batch_size]`, containing a frequency weight for each batch item. If present, then the batch normalization uses weighted mean and variance. (This can be used to correct for bias in training example selection.)
used: Use nn.fused_batch_norm if True, nn.batch_normalization otherwise.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new air of variables ‘moving_mean/biased‘ and ‘moving_mean/local_step‘.
scope: Optional scope for `variable_scope`.指定变量的作用域variable_scope。
renorm: Whether to use Batch Renormalization https://arxiv.org/abs/1702.03275). This adds extra variables during raining. The inference is the same for either value of this parameter.
renorm_clipping: A dictionary that may map keys ‘rmax‘, ‘rmin‘, ‘dmax‘ to scalar `Tensors` used to clip the renorm correction. The correction `(r, d)` is used as `corrected_value = normalized_value * r + d`, with `r` clipped to [rmin, rmax], and `d` to [-dmax, dmax]. Missing rmax, rmin, dmax are set to inf, 0, inf, respectively.
renorm_decay: Momentum used to update the moving means and standard deviations with renorm. Unlike `momentum`, this affects training and should be neither too small (which would add noise) nor too large (which would give stale estimates). Note that `decay` is still applied to get the means and variances for inference.

第十八节，TensorFlow中使用批量归一化

原文：https://www.cnblogs.com/zyly/p/8996070.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

第十八节，TensorFlow中使用批量归一化

一 批量归一化函数

1.TensorFlow中自带BN函数的定义：

2.tf.nn.moment()函数的定义如下：

3.tf.train.ExponentialMovingAverage类的定义如下：

二 批量归一化的简单用法

一批量归一化函数

二批量归一化的简单用法