神经网络通常第一层称为输入层,最后一层 \(L\) 被称为输出层,其他层 \(l\) 称为隐含层 \((1<l<L)\)。
设输入向量为:
\(x = (x_1,x_2,...,x_i,...,x_m),\quad i = 1,2,...,m\)
输出向量为:
\(y = (y_1, y_2,...,y_k,...,y_n),\quad k = 1,2,...,n\)
第\(l\)隐含层的输出为:
\(h^{(l)} = (h^{(l)}_1,h^{(l)}_2,...,h^{(l)}_i,...,h^{(l)}_{s_l}), \quad i = 1,2,...,s_l\)
其中:$ s_l $ 为第 \(l\) 层神经元的个数。
设$ W_{ij}^{(l)} $为第 \(l\) 层的神经元 \(i\) 与第 \(l-1\) 层神经元 \(j\) 的连接权值;$ b_i^{(l)} $为第 \(l\) 层神经元 \(i\) 的偏置,有:
\(h_i^{(l)} = f(net_i^{(l)})\)
\(net_i^{(l)} = \sum_{j=1}^{s_l - 1} W_{ij}^{(l)}h_j^{(l-1)} + b_i^{(l)}\)
其中,$ net_i^{(l)} $是第 \(l\) 层的第 \(i\) 个神经元的输入,\(f(x)\) 为神经元的激活函数:
\(f(x) = \frac{1}{1+e^{-x}} \quad f'(x) = f(x)(1-f(x))\)
设 \(m\) 个训练样本:\(\{(x(1),y(1)), (x(2),y(2)), (x(3), y(3)), ... ,(x(m), y(m))\}\) 期望
输出:\(d(i)\)
误差函数:
\[
E=\frac{1}{m}\sum_{i=1}^{m}E(i)
\]
$ E(i) $是一个样本的训练误差:
\[
E(i) = \frac{1}{2}\sum^n_{k=1}(d_k(i) - y_k(i))^2\y_k(i) = h^{(L)}_k(i)
\]
代入有:
\[
E(i) = \frac{1}{2m}\sum_{i=1}^{m}\sum^n_{k=1}(d_k(i) - y_k(i))^2
\]
权值更新:
\[
W_{ij}^{(l)} = W_{ij}^{(l)} - \alpha \frac{\partial E}{\partial W_{ij}^{(l)}}
\]
偏置更新:
\[
b_{i}^{(l)} = b_{i}^{(l)} - \alpha \frac{\partial E}{\partial b_{i}^{(l)}}
\]
其中:$ \alpha $ 是学习率。
对于单个样本,输出层的权值偏导为:
\[
\frac{\partial E(i)}{\partial W_{kj}^{(L)} }
= \frac{\partial}{\partial W_{kj}^{(L)}}(\frac{1}{2}\sum^n_{k=1}(d_k(i) - y_k(i))^2)\= \frac{\partial}{\partial W_{kj}^{(L)}}(\frac{1}{2}(d_k(i) - y_k(i))^2)\= -(d_k(i) - y_k(i))\frac{\partial y_k(i)}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))\frac{\partial y_k(i)}{\partial net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}\\]
则:
\[
\frac{\partial E(i)}{\partial W_{kj}^{(L)} }
=-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}
\]
同理有:
\[
\frac{\partial E(i)}{\partial b_k^{(L)} }
=-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}
\]
令:
\[
\delta_k^{(L)} = \frac{\partial E(i)}{\partial b_k^{(L)} }
\]
则有:
\[
\frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \delta_k^{(L)}h_j^{(L-1)}
\]
对于隐含层 \(L-1\):
\[
\frac{\partial E(i)}{\partial W_{ji}^{(L-1)}}
= \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - y_k(i) )^2 )\= \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - f(\sum_{j=1}^{s_{L-1} } W_{kj}^{(L)} h_j^{(L-1)} + b_k^{(L)} ))^2 )\= \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - f(\sum_{j=1}^{s_{L-1} } W_{kj}^{(L)}
f(\sum_{i=1}^{s_{L-2} } W_{ji}^{(L-1)} h_i^{(L-2)} + b_j^{(L-1)})
+ b_k^{(L)} ))^2 )\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{ji}^{(L-1)} }\\]
其中:
\[
net_k^{(L)}
= \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)}h_j^{(L-1)} + b_k^{(L)}\= \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)} f(net_j^{(L-1)}) + b_k^{(L)}\= \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)} f(\sum^{s_{L-2}}_{i=1} W_{ji}^{(L-1)} h_i^{(L-2)} + b_j^{(L-1)} )+ b_k^{(L)}\\]
代入有:
\[
\frac{\partial E(i)}{\partial W_{ji}^{(L-1)}}
= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{ji}^{(L-1)} }\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} \frac{\partial net_k^{(L)} }{\partial f(net_j^{(L-1)})} \frac{\partial f(net_j^{(L-1)})}{\partial net_j^{(L-1)}} \frac{\partial net_j^{(L-1)}}{\partial W_{ji}^{L-1} }\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} h_i^{(L-2)} \\]
同理可得:
\[
\frac{\partial E(i)}{\partial b_j^{(L-1)}}
= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \\]
令:
\[
\delta_j^{(L-1)} = \frac{\partial E(i)}{\partial b_j^{(L-1)}}
\]
有:
\[
\delta_j^{(L-1)}
= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \= \sum^n_{k=1}\delta_k^{(L)} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}}\\]
\[ \frac{\partial E(i)}{\partial W_{ji}^{(L-1)}} = \delta_j^{(L-1)}h_i^{(L-2)} \]
由此可得,第 \(l(1<l<L)\) 层的权值和偏置的偏导为:
\[
\frac{\partial E(i)}{\partial W_{ji}^{(l)}}
= \delta_j^{(l)}h_i^{(l-1)}\\frac{\partial E(i)}{\partial b_j^{(l)}}
= \delta_j^{(l)} \\delta_j^{(l)}
= \sum_{k=1}^{s_{l+1}} \delta_k^{(l+1)} W_{kj}^{(l+1)}f'(x)|_{x=net_j^{(l)}}\\]
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \frac{\partial E(i)}{\partial h_k^{(L)}} \frac{\partial h_k^{(L)}}{\partial net_k^{(L)}} \frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}\\]
则:
\[
\frac{\partial E(i)}{\partial W_{kj}^{(L)} }
=-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}
\]
对偏置向量求偏导:
\[
\frac{\partial E(i)}{\partial b_k^{(L)} }
= \frac{\partial E(i)}{\partial h_k^{(L)}}
\frac{\partial h_k^{(L)}}{\partial net_k^{(L)}}
\frac{\partial net_k^{(L)}}{\partial b_k^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}\\]
则:
\[
\frac{\partial E(i)}{\partial b_k^{(L)} }
=-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}
\]
令:
\[
\delta_k^{(L)} = \frac{\partial E(i)}{\partial b_k^{(L)} }
\]
则有:
\[
\frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \delta_k^{(L)}h_j^{(L-1)}
\]
对权值矩阵求偏导:
\[
\frac{\partial E(i)}{\partial W_{ji}^{(L-1)} }
=
\frac{\partial E(i)}{\partial h_k^{(L)}}
\frac{\partial h_k^{(L)}}{\partial net_k^{(L)}}
\frac{\partial net_k^{(L)}}{\partial h_j^{(L-1)}}
\frac{\partial h_j^{(L-1)}}{\partial net_j^{(L-1)}}
\frac{\partial net_j^{(L-1)}}{\partial W_{ji}^{(L-1)}}\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} h_i^{(L-2)} \\]
对偏置向量求偏导:
\[
\frac{\partial E(i)}{\partial b_j^{(L-1)} }
=
\frac{\partial E(i)}{\partial h_k^{(L)}}
\frac{\partial h_k^{(L)}}{\partial net_k^{(L)}}
\frac{\partial net_k^{(L)}}{\partial h_j^{(L-1)}}
\frac{\partial h_j^{(L-1)}}{\partial net_j^{(L-1)}}
\frac{\partial net_j^{(L-1)}}{\partial b_j^{(L-1)}}\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \\]
原文:https://www.cnblogs.com/niubidexiebiao/p/10508145.html