计算如下
\begin{array}{l}{x_{1}=w_{1} * \text { input }} \\ {x_{2}=w_{2} * x_{1}} \\ {x_{3}=w_{3} * x_{2}}\end{array}
其中$w_{1}$,$w_{2}$,$w_{3}$是权重参数,是需要梯度的。在初始化时,三个值分别为1,0,1。
程序代码如下:
import torch import torch.nn as nn input_data = torch.randn(1) weight1 = torch.ones(1,requires_grad=True) weight2 = torch.zeros(1,requires_grad=True) weight3 = torch.ones(1,requires_grad=True) x_1 = weight1 * input_data x_2 = weight2 * x_1 x_3 = weight3 * x_2 one = torch.ones(1) x_3 = x_3 * one x_3.backward() print("x1:{},x2{},x3{},weight1_gard:{},weight2_gard:{},weight3_gard:{}".format(x_1,x_2,x_3, weight1.grad,weight2.grad,weight3.grad))
运行时,随机产生的Input_data为1.688,三个权重的梯度值分别为0,1.688,0。
梯度的计算公式如下:
\begin{equation}
\frac{\partial x_{3}}{\partial w_{3}}=x_{2}
\end{equation}
\begin{equation}
\frac{\partial x_{3}}{\partial x_{2}}=w_{3}
\end{equation}
\begin{equation}
\frac{\partial x_{3}}{\partial w_{2}}=\frac{\partial x_{3}}{\partial x_{2}} \frac{\partial x_{2}}{\partial w_{2}}=w_{3} * x_{1}
\end{equation}
\begin{equation}
\frac{\partial x_{3}}{\partial x_{1}}=\frac{\partial x_{3}}{\partial x_{2}} \frac{\partial x_{2}}{\partial x_{1}}=w_{3} * w_{2}
\end{equation}
\begin{equation}
\frac{\partial x_{3}}{\partial w_{1}}=\frac{\partial x_{3}}{\partial x_{1}} \frac{\partial x_{1}}{\partial w_{1}}=w_{3} * w_{2} * input
\end{equation}
由此可以看出一个问题是,权重数据为0,不代表其梯度也会等于0,权重数据不为0,不代表其梯度就不会为0.
在进行一些模型修改的时候常常会将一些卷积核置为零,但是如果这些卷积核仍然requires_grad=True,那么在反向梯度传播的时候这些卷积核还是有可能会更新改变值的。
原文:https://www.cnblogs.com/yanxingang/p/10798126.html