首页 > 编程语言 > 详细

全连接神经网络反向传播算法推导

时间:2021-07-26 22:34:41      阅读:144      评论:0      收藏:0      [点我收藏+]

全连接神经网络

技术分享图片

写在开头:

本文提供了理解反向传播算法的一些思路。完整推导反向传播算法的比较少见,大多都是给出几个公式,没有更加详细的推导,这对非数学系的初学者非常不友好。

本文绕开Hadmard积,用雅可比矩阵(见高等数学多元微积分)来解释与推导,虽然可能会变得复杂,但对于只接触过线性代数的初学者可能比较友好。

能力有限,欢迎纠错。

0.定义

\[\begin{aligned} & 1.每层的输入以及输出都是一个一维的列向量,假设上一层的输出是k×1的列向量,当前层的输出是j×1的列向量,则权重矩阵的维度为j×k,偏置项的维度为j×1。\& 2.第l层经过激活函数之前的输出:z^l=W^{l}a^{l-1}+b^{l}\& 3.第l层经过激活函数之后的输出:a^{l}={\sigma}(z^l)\& 4.损失函数:以C={\frac{1}{2}}{||a^l-y||_2^2}为例\& \end{aligned} \]

1.中间量\(\ \delta^l\)

为计算方便引入中间量\(\ \delta^l\),称为第\(\ l\)层的\(\delta\)误差,表示误差函数对于神经网络第\(\ l\)层激活前输出值的偏导数,即 :

\[\begin{aligned} & {\delta}^l=\frac{\partial C}{\partial z^l}=\frac{\partial C}{\partial a^l}\frac{\partial a^l}{\partial z^l}=(a^l-y)*{\sigma{‘}(z^l)}----①\& 注:‘*‘为Hadmard积,即对应逐元素相乘,与矩阵乘法相区分。 \end{aligned} \]

2.权重矩阵和偏置项

\[\begin{aligned} & \frac{\partial C}{\partial W^l}=\frac{\partial C}{\partial z^l}\frac{\partial z^l}{\partial W^l}={\delta^l(a^{l-1})^T}-----②\& \frac{\partial C}{\partial b^l}=\frac{\partial C}{\partial z^l}\frac{\partial z^l}{\partial b^l}={\delta^l*1}=\delta^l------③ \end{aligned} \]

3.上一层的\(\ \delta^{l-1}\)误差

\[\begin{aligned} & ~~~~~{\delta}^{l-1}=\frac{\partial C}{\partial z^{l-1}}=\frac{\partial C}{\partial z^{l}}\frac{\partial z^{l}}{\partial z^{l-1}}={\delta}^{l}\frac{\partial z^{l}}{\partial z^{l-1}}\& ∵z^{l}=W^{l}a^{l-1}+b^{l}\& \& ∴{\delta}^{l-1}=(W^{l})^T{\delta}*{\sigma{‘}(z^l)}-------④ \end{aligned} \]

以此类推:除输入层外,可以得到每一层的误差。

4.梯度下降法更新参数

在求得每一层的\(\ \delta\)误差后,可以由式②③求出误差函数C对于每一层参数的梯度:

\[\begin{aligned} & \frac{\partial C}{\partial W^l}=\frac{\partial C}{\partial z^l}\frac{\partial z^l}{\partial W^l}={\delta^l(a^{l-1})^T}\& \frac{\partial C}{\partial b^l}=\frac{\partial C}{\partial z^l}\frac{\partial z^l}{\partial b^l}={\delta^l*1}=\delta^l \end{aligned} \]

更新参数:

\[\begin{aligned} & W^l=W^l-{\eta}\frac{\partial C}{\partial W^l}\& b^l=b^l-{\eta}\frac{\partial C}{\partial b^l} \end{aligned} \]

以上参考于:链接

公式详解(1、2、3)

1.上述公式推导有一定难度,但想要学好深度学习,对公式的理解是必不可少的 。

2.由于我个人的能力,理解也就到这了,如果有误欢迎纠错。

理解一下内容所需基础:

1.高等数学---多元微积分

2.线性代数

3.雅可比矩阵和链式求导(单独列出是有原因的)

1.中间量\(\ \delta^l\)

\[\begin{aligned} & {\delta}^l=\frac{\partial C}{\partial z^l}=\frac{\partial C}{\partial a^l}\frac{\partial a^l}{\partial z^l}=(a^l-y)*{\sigma{‘}(z^l)}-----①\\end{aligned} \]

准备工作

\[\begin{aligned} & C={\frac{1}{2}}{||a^l-y||_2^2}=\frac{1}{2} [(a_1^l-y_1)^2+ (a_2^l-y_2)^2+ ...+ (a_j^l-y_j)^2]\& a^l=\sigma(z^l)\& z^l={\left[ {\begin{matrix} z_1^l&z_2^l&...&z_j^l \end{matrix}} \right]^T}\\end{aligned} \]

推导

\[\begin{aligned} & {\delta}^l=\frac{\partial C}{\partial z^l}&~~~={\left[ {\begin{matrix} \frac{\partial C}{\partial z_1^l}& \frac{\partial C}{\partial z_2^l}& ...& \frac{\partial C}{\partial z_j^l} \end{matrix}} \right]^T}(雅可比矩阵)\ &~~~={\left[ {\begin{matrix} \frac{\partial C}{\partial a_1^l}\frac{\partial a_1^l}{\partial z_1^l}& \frac{\partial C}{\partial a_2^l}\frac{\partial a_2^l}{\partial z_2^l}& ...& \frac{\partial C}{\partial a_j^l}\frac{\partial a_j^l}{\partial z_j^l} \end{matrix}} \right]^T}&~~~={\left[ {\begin{matrix} (a_1^l-y_1)\sigma{‘}(z_1^l)& (a_2^l-y_2)\sigma{‘}(z_2^l)& ...& (a_j^l-y_j)\sigma{‘}(z_j^l) \end{matrix}} \right]^T}\ &~~~=(a^l-y)*{\sigma{‘}(z^l)} \end{aligned} \]

2.权重矩阵和偏置项

权重矩阵

\[\begin{aligned} & \frac{\partial C}{\partial W^l}=\frac{\partial C}{\partial z^l}\frac{\partial z^l}{\partial W^l}={\delta^l(a^{l-1})^T}-----② \end{aligned} \]

准备工作

\[\begin{aligned} & C={\frac{1}{2}}{||a^l-y||_2^2}=\frac{1}{2} [(a_1^l-y_1)^2+ (a_2^l-y_2)^2+ ...+ (a_j^l-y_j)^2]\& a^l=\sigma(z^l)\& a^l={\left[ {\begin{matrix} a_1^l&a_2^l&...&a_j^l \end{matrix}} \right]^T}~~~~~ a^{l-1}={\left[ {\begin{matrix} a_1^{l-1}&a_2^{l-1}&...&a_k^{l-1} \end{matrix}} \right]^T}\& z^l={\left[ {\begin{matrix} z_1^l&z_2^l&...&z_j^l \end{matrix}} \right]^T}& \& \& W_{j \times k}= \left[ {\begin{matrix} w_{11}&w_{12}&...&w_{1k}\ w_{21}&w_{22}&...&w_{2k}\ ...&...&...&...\ w_{j1}&w_{j2}&...&w_{jk}\ \end{matrix}} \right] ={\left[ {\begin{matrix} w_1^l&w_2^l&...&w_j^l \end{matrix}} \right]^T}& a_j^l=\sigma(\sum_{k}{w_{jk}^la_k^{l-1}}+b_j^l)=\sigma(z_j^l)\& z_j^l=\sum_{k}{w_{jk}^la_k^{l-1}}+b_j^l\end{aligned} \]

推导

\[\begin{aligned} & \frac{\partial C}{\partial W^l} ={\left[ {\begin{matrix} \frac{\partial C}{\partial a_1^l}\frac{\partial a_1^l}{\partial z_1^l}\frac{\partial z_1^l}{\partial w_1^l}& \frac{\partial C}{\partial a_2^l}\frac{\partial a_2^l}{\partial z_2^l}\frac{\partial z_2^l}{\partial w_2^l}& ...& \frac{\partial C}{\partial a_j^l}\frac{\partial a_j^l}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_j^l} \end{matrix}} \right]^T}\ &~~~~~~~~~={\left[ {\begin{matrix} \frac{\partial C}{\partial a_1^l}\frac{\partial a_1^l}{\partial z_1^l}\frac{\partial z_1^l}{\partial w_{11}^l}& \frac{\partial C}{\partial a_1^l}\frac{\partial a_1^l}{\partial z_1^l}\frac{\partial z_1^l}{\partial w_{12}^l}& ...& \frac{\partial C}{\partial a_1^l}\frac{\partial a_1^l}{\partial z_1^l}\frac{\partial z_1^l}{\partial w_{1k}^l}\ \frac{\partial C}{\partial a_2^l}\frac{\partial a_2^l}{\partial z_2^l}\frac{\partial z_2^l}{\partial w_{21}^l}& \frac{\partial C}{\partial a_2^l}\frac{\partial a_2^l}{\partial z_2^l}\frac{\partial z_2^l}{\partial w_{22}^l}& ...& \frac{\partial C}{\partial a_2^l}\frac{\partial a_2^l}{\partial z_2^l}\frac{\partial z_2^l}{\partial w_{2k}^l}\ ...& ...& ...& ...\ \frac{\partial C}{\partial a_j^l}\frac{\partial a_j^l}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_{j1}^l}& \frac{\partial C}{\partial a_j^l}\frac{\partial a_j^l}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_{j2}^l}& ...& \frac{\partial C}{\partial a_j^l}\frac{\partial a_j^l}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_{jk}^l}\ \end{matrix}} \right]}(雅可比矩阵)&~~~~~~~~~={\delta^l(a^{l-1})^T} \end{aligned} \]

偏置项

同理(这个理解应该不难)

3.上一层的\(\ \delta^{l-1}\)误差

\[\begin{aligned} & {\delta}^{l-1}=(W^{l})^T{\delta}*{\sigma{‘}(z^l)}-----④ \end{aligned} \]

准备工作

\[\begin{aligned} & z^{l-1}={\left[ {\begin{matrix} z_1^{l-1}& z_2^{l-1}& ...& z_k^{l-1} \end{matrix}} \right]^T}\& C={\frac{1}{2}}{||a^l-y||_2^2}=\frac{1}{2} [(a_1^l-y_1)^2+ (a_2^l-y_2)^2+ ...+ (a_j^l-y_j)^2]\& a_j^l=\sigma(\sum_{k}{w_{jk}^la_k^{l-1}}+b_j^l)=\sigma(z_j^l)\& z_j^l=\sum_{k}{w_{jk}^la_k^{l-1}}+b_j^l\& a_k^{l-1}=\sigma(z_k^{l-1})\\end{aligned} \]

推导

\[\begin{aligned} & \delta=\frac{\partial C}{\partial z^{l-1}}\&~~={\left[ {\begin{matrix} \frac{\partial C}{\partial z_1^{l-1}}& \frac{\partial C}{\partial z_2^{l-1}}& ...& \frac{\partial C}{\partial z_k^{l-1}} \end{matrix}} \right]^T}\&~~={\left[ {\begin{matrix} \sum_{i=1}^{j}{\frac{\partial C}{\partial a_i^{l}}\frac{\partial a_i^{l}}{\partial z_i^{l}}\frac{\partial z_i^{l}}{\partial a_i^{l-1}}\frac{\partial a_i^{l-1}}{\partial z_1^{l-1}}}& \sum_{i=1}^{j}{\frac{\partial C}{\partial a_i^{l}}\frac{\partial a_i^{l}}{\partial z_i^{l}}\frac{\partial z_i^{l}}{\partial a_i^{l-1}}\frac{\partial a_i^{l-1}}{\partial z_2^{l-1}}}& ...& \sum_{i=1}^{j}{\frac{\partial C}{\partial a_i^{l}}\frac{\partial a_i^{l}}{\partial z_i^{l}}\frac{\partial z_i^{l}}{\partial a_i^{l-1}}\frac{\partial a_i^{l-1}}{\partial z_k^{l-1}}} \end{matrix}} \right]^T}\ &~~=(W^{l})^T{\delta}*{\sigma{‘}(z^l)} \end{aligned} \]

注:有求和符号是因为,损失函数C中有\(\ [a_1^l a_2^l ... a_j^l]\),其中某个\(\ a_i^l\)会受到上一层的每一个神经元输出\(\ a_i^{l-1}\)的影响(如果还是不能理解,再看一遍全连接神经网络图和准备工作中的公式)。

结尾:

差不多就写到这里了,总的来说对理解这些公式提供了一些思路,要真正理解这些公式必然逃不过在纸上从头到尾的推导。

2021.7.26 ghb

全连接神经网络反向传播算法推导

原文:https://www.cnblogs.com/430442-CmjAndGhb/p/15062772.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!