上篇博客 python利用梯度下降求多元线性回归讲解了利用梯度下降实现多元线性回归,但是它拟合的是线性函数,这篇博客要在上一篇的博客基础上加上非线性单元,实现一个最简单的神经网络
上一篇博客线性回归:
即
import numpy as np
# 非线性函数,deriv为False即为求f(x),deriv为True即为求f‘(x)
def nonlin(x,deriv=False):
if deriv==True:
return x*(1-x)
else:
return 1/(1+np.exp(-x))
x = np.array([ [0,0,1],
[1,1,1],
[1,0,1],
[0,1,1] ])
y = np.array([[0,1,1,0]]).T
mu, sigma = 0, 0.1 # 均值与标准差
w = np.random.normal(mu, sigma, (3,1))
iter_size = 1000
lr = 1
for i in xrange(iter_size):
# (data_num,weight_num)
L0 = x
#(data_num,weight_num)*(weight_num,1)= (data_num,1)
L1 = nonlin(L0.dot(w))
# (data_num,1)
L1_loss = L1-y
# (data_num,1)
L1_delta = L1_loss*nonlin(L1,True)
# (weight_num,data_num) *(data_num,1)= (weight_num,1)
grad = L0.T.dot(L1_delta)*lr
w -= grad
print L1
import numpy as np
def nonlin(x,deriv=False):
if deriv==True:
return x*(1-x)
else:
return 1/(1+np.exp(-x))
x = np.array([ [0,0,1],
[1,1,1],
[1,0,1],
[0,1,1] ])
y = np.array([[0,1,1,0]]).T
mu, sigma = 0, 0.1 # 均值与标准差
w0 = np.random.normal(mu, sigma, (3,5))
w1 = np.random.normal(mu, sigma, (5,1))
iter_size = 10000
lr = 1
for i in xrange(iter_size):
# (data_num,weight_num_0)
L0 = x
#(data_num,weight_num_0)*(weight_num_0,weight_num_1)= (data_num,weight_num_1)
L1 = nonlin(L0.dot(w0))
# (data_num,weight_num_1)*(weight_num_1,1)=(data_num,1)
L2 = nonlin(L1.dot(w1))
# (data_num,1)
L2_loss = L2-y
# (data_num,1)
L2_delta = L2_loss*nonlin(L2,True)
#(weight_num_1,data_num) *(data_num,1)= (weight_num_1,1)
grad1 = L1.T.dot(L2_delta)
w1 -= grad1*lr
# (data_num,1)*(1,weight_num_1)=(data_num,weight_num_1)
# L1对L2_loss贡献了多少,反过来传梯度时就要乘以这个权重
L1_loss = L2_delta.dot(w1.T)
# (data_num,weight_num_1)
L1_delta = L1_loss*nonlin(L1,True)
# (weight_num_0,data_num)*(data_num,weight_num_1)=(weight_num_0,weight_num_1)
grad0 = L0.T.dot(L1_delta)
w0 -= grad0*lr
print L2
......
L1 = nonlin(L0.dot(w0))
# 在L1后面加上
if(do_dropout):
L1 *= np.random.binomial([np.ones((len(x),w1_dim))],1-dropout_percent)[0] * (1.0/(1-dropout_percent))
# (data_num,weight_num_1)*(weight_num_1,1)=(data_num,1)
L2 = nonlin(L1.dot(w1))
......
‘‘‘
详解以上代码:
L1:(data_num_w1_dim)
np.random.binomial([np.ones((len(x),w1_dim))],1-dropout_percent)[0]
[np.ones((len(x),w1_dim))] : (data_num,w1_dim)
return_value = np.random.binomial(n,p,size=None) 二项分布,例如袋子中有黑白两种颜色的球若干,抽中黑球的概率是p,那么有放回的抽n次,抽中的黑球的次数是return_value,size是进行多少次这样的实验,一般可忽略。
比如投掷硬币连续出现两次证明的概率是:
print sum(np.random.binomial(2,0.5,size=2000)==2)/2000. #注意2000后面的. 不然除法后转为int了
0.2505 重复实验2000次刚好和理论值0.5*0.5=0.25接近
而这里对[np.ones((len(x),w1_dim))],共len(x)行,w1_dim列,每个位置都是1即抽一次,抽中1次的概率是1-dropout_percent),抽中0次的概率是dropout_percent,这就相当于对于L1每个位置的值,都以dropout_percent的概率被淘汰掉
还有个重要的一点是后面的
* (1.0/(1-dropout_percent))
别人的解释:
A simple intuition is that if you‘re turning off half of your hidden layer, you want to double the values that ARE pushing forward so that the output compensates correctly. Many thanks to @karpathy for catching this one.
大致意思就是就是你dropout了隐层的一些值,需要放大其他值进行补偿
‘‘‘
原文:http://blog.csdn.net/u013010889/article/details/70904158