transformer中的 train.py的理解

时间：2018-11-06 22:20:08 阅读：202 评论：0 收藏：0 [点我收藏+]

1. 定义矩形scheme ret 得到一个bach_sizes数组
{‘min_length‘: 8, ‘window_size‘: 720,
‘shuffle_queue_size‘: 270,
‘boundaries‘: [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 33, 36, 39, 42, 46, 50, 55, 60, 66, 72, 79, 86, 94, 103, 113, 124, 136, 149, 163, 179, 196, 215, 236],
‘max_length‘: 256,
‘batch_sizes‘: [240, 180, 180, 180, 144, 144, 144, 120, 120, 120, 90, 90, 90, 90, 80, 72, 72, 60, 60, 48, 48, 48, 40, 40, 36, 30, 30, 24, 24, 20, 20, 18, 18, 16, 15, 12, 12, 10, 10, 9, 8, 8]}
2.input_pipline 读取文件 10个文件 decode_record
组合成字典形式的数据集 dataset {"src_id": "target_id":}
（1）过滤长度:#根据源端和目标端句子长度最大的过滤
length = _example_length(example)
return tf.logical_and(length >= min_length, length <= max_length)
dataset = dataset.filter(functools.partial(example_valid_size, min_length = batching_scheme["min_length"], max_length = batching_scheme["max_length"]))
filter会作用于每一个dataset
（2）根据长度选择篮子编号：传入dataset {"src_id": "target_id":} 以及bundaries{} 遍历句子的长度，进行比较
conditions_c = tf.logical_and(tf.less_equal(buckets_min, seq_length), tf.less(seq_length, buckets_max))
返回 budaries所在的位置
根据上次返回的id，找到篮子的位置，并找到窗口的大小。其窗口的定义，用英文的解释比较好理解：我所理解的就是，比如一个能放mg的篮子
window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
tf.contrib.data.group_by_window(

key_func,
reduce_func,
window_size=None,
window_size_func=None
)
Defined in tensorflow/contrib/data/python/ops/grouping.py.

A transformation that groups windows of elements by key and reduces them.

This transformation maps each consecutive element in a dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size_func(key) elements matching the same key. All except the final window for each key will contain window_size_func(key) elements; the final window may be smaller.

You may provide either a constant window_size or a window size determined by the key through window_size_func.

Args:
key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
reduce_func: A function mapping a key and a dataset of up to window_size consecutive elements matching that key to another dataset.
window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
window_size_func: A function mapping a key to a tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size.
Returns:
A Dataset transformation function, which can be passed to tf.data.Dataset.apply.

Raises:
ValueError: if neither or both of {window_size, window_size_func} are passed.
（3）进行 pad grouped_dataset.padded_batch(batch_size, padded_shapes) ----group_dataset是什么 batch_size 为句子的个数 padded_shapes 要pad的维度
整合，将id序列编程矩阵 dataset.apply(tf.contrib.data.group_by_window(example_to_bucket_id, batching_fn, None, )
二：

一维卷积：https://blog.csdn.net/appleyuchi/article/details/78597054
tf.reshape:https://blog.csdn.net/lxg0807/article/details/53021859
list和tublehttps://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014316724772904521142196b74a3f8abf93d8e97c6ee6000
expend_dims :https://blog.csdn.net/qq_31780525/article/details/72280284
tf.concat 以及tf.split: https://blog.csdn.net/momaojia/article/details/77603322 https://blog.csdn.net/UESTC_C2_403/article/details/73350457
feedforward:一维卷积网络设计，然后两层卷积之间加了relu非线性操作。之后是residual操作加上inputs残差，然后是normalize--->不直接用layers.dense直接进行全连接
label_smothing:
（1）normalization: normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
outputs = gamma * normalized + beta 获取均值和方差:
‘‘‘Applies layer normalization.

Args:
inputs: A tensor with 2 or more dimensions, where the first dimension has
`batch_size`.
epsilon: A floating number. A very small number for preventing ZeroDivision Error.
scope: Optional scope for `variable_scope`.
reuse: Boolean, whether to reuse the weights of a previous layer
by the same name.

Returns:
A tensor with the same shape and data dtype as `inputs`.

‘‘‘
beta,和gamma没有做什么？
（2）embedding: 其用到了一个tensorflow中一个embedding 方法使输入的张量分布的更均匀，词与词之间存在着某种关系
并且比输入的多一个维度，最后一维为神经元的个数
scale参数对outputs根据num_units的大小进行了scale，当scale为True时执行scale，默认为True???????
‘‘‘Embeds a given tensor.
Args:
inputs: A `Tensor` with type `int32` or `int64` containing the ids
to be looked up in `lookup table`.
vocab_size: An int. Vocabulary size.
num_units: An int. Number of embedding hidden units.
zero_pad: A boolean. If True, all the values of the fist row (id 0)
should be constant zeros.
scale: A boolean. If True. the outputs is multiplied by sqrt num_units.
scope: Optional scope for `variable_scope`.
reuse: Boolean, whether to reuse the weights of a previous layer
by the same name.
Returns:
A `Tensor` with one more rank than inputs‘s. The last dimensionality
should be `num_units`.
其中有用到一个函数：其作用相当于，中文---英文之间的对应一个博客里讲的很靠谱吧，就是输入一个inputs_tensor 当作字典，
然后给出要表示的ids,最后给出tensor
其链接：https://www.jianshu.com/p/677e71364c8e 其用到one-hot编码https://blog.csdn.net/pipisorry/article/details/61193868
(3)multi-head attention；
a. QKV的全连接 dense:全连接层，其最后一维变为num_units,
且 outputs = activation(inputs * kernel + bias)
b.mask 的操作，利用reduce_sum找出为0 的，进行mask，通过将attention_score设置为最小值，标记其位置

（4）dropout:
(5)label_smothing:做平滑操作
（6）位置编码: 有点问题
最近一直在看这个，但是还是有很多的问题。。

transformer中的 train.py的理解

原文：https://www.cnblogs.com/Shaylin/p/9918178.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)