MXNet是基础,Gluon是封装,两者犹如TensorFlow和Keras,不过得益于动态图机制,两者交互比TensorFlow和Keras要方便得多,其基础操作和pytorch极为相似,但是方便不少,有pytorch基础入门会很简单。
库导入写法,
from mxnet import ndarray as nd from mxnet import autograd from mxnet import gluon import mxnet as mx
mxnet.ndarray是整个科学计算系统的基础,整体API和numpy的nparray一致,这一点类似于pytorch,不过不同于pytorch内置变量、张量等不同数据类型,mxnet简化了只有ndarray一种,通过mxnet.autograd可以直接实现求导,十分便捷.
x = nd.arange(4).reshape((4, 1))
# 标记需要自动求导的量
x.attach_grad()
# 有自动求导就需要记录计算图
with autograd.record():
y = 2 * nd.dot(x.T, x)
# 反向传播输出
y.backward()
# 获取梯度
print(‘x.grad: ‘, x.grad)
nd.asscalar()
y = nd.array(x) # NumPy转换成NDArray。
z = y.asnumpy() # NDArray转换成NumPy。
nd.elemwise_add(x, y, out=z)
relu激活
def relu(X):
return nd.maximum(X, 0)
全连接层
# 变量生成
w = nd.random.normal(scale=1, shape=(num_inputs, 1))
b = nd.zeros(shape=(1,))
params = [w, b]
# 变量挂载梯度
for param in params:
param.attach_grad()
# 实现全连接
def net(X, w, b):
return nd.dot(X, w) + b
def sgd(params, lr, batch_size):
for param in params:
param[:] = param - lr * param.grad / batch_size
import mxnet as mx
from mxnet import autograd, nd
import numpy as np
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = nd.random.normal(scale=1, shape=(num_examples, num_inputs))
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += nd.random.normal(scale=0.01, shape=labels.shape)
from mxnet.gluon import data as gdata
batch_size = 10
dataset = gdata.ArrayDataset(features, labels)
data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True)
for X, y in data_iter:
print(X, y)
break
[[-1.74047375 0.26071024] [ 0.65584248 -0.50490594] [-0.97745866 -0.01658815] [-0.55589193 0.30666101] [-0.61393601 -2.62473822] [ 0.82654613 -0.00791582] [ 0.29560572 -1.21692061] [-0.35985938 -1.37184834] [-1.69631028 -1.74014604] [ 1.31199837 -1.96280086]] <NDArray 10x2 @cpu(0)> [ -0.14842382 7.22247267 2.30917668 2.0601418 11.89551163 5.87866735 8.94194221 8.15139961 6.72600317 13.50252151] <NDArray 10 @cpu(0)>
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(1))
net.collect_params().initialize(mx.init.Normal(sigma=1)) # 模型参数初始化选择normal分布
wd参数为模型添加了L2正则化,机制为:w = w - lr*grad - wd*w
trainer = gluon.Trainer(net.collect_params(), ‘sgd‘, {
‘learning_rate‘: learning_rate, ‘wd‘: weight_decay})
trainer.step(batch_size)需要运行在每一次反向传播之后,会更新参数,一次模拟的训练过程如下,
for e in range(epochs):
for data, label in data_iter_train:
with autograd.record():
output = net(data)
loss = square_loss(output, label)
loss.backward()
trainer.step(batch_size)
train_loss.append(test(net, X_train, y_train))
test_loss.append(test(net, X_test, y_test))
拉伸
nn.Flatten()
全连接层
gluon.nn.Dense(256, activation="relu")
参数表示输出节点数
交叉熵
loss = gloss.SoftmaxCrossEntropyLoss()
原文:https://www.cnblogs.com/hellcat/p/9038649.html