Liner regression with one variable
Given the "right answer" for each example in the data. 对每个数据来说,我们给出了”正确答案“。
Predict real-valued output. 我们根据之前的数据预测出一个准确的输出值。
Notation: 常见符号表示
Training set
-> Learning Algorithm
-> h
将数据集“喂”给学习算法,学习算法输出一个函数。
x
-> h
-> y
a map from \(x's\) to \(y's\). 是一个从x到y的映射函数。
How do we represent h
?
\(h_{\theta}(x) = \theta_0 + \theta_1 \times x\).
数据集和函数的作用:预测一个关于x的线性函数y
如何把最有可能的直线与我们的数据相拟合
Choose \(\theta_0, \theta_1\) so that \(h_{\theta}(x)\) is close to \(y\) for our training examples (\(x, y\)).
\(J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i)^2\)
目标:找到\(\theta_0,\theta_1\) 使得 \(J(\theta_0, \theta_1)\)最小。其中\(J(\theta_0, \theta_1)\)称为代价函数
\(\theta_0 = 0 \rightarrow h_{\theta}(x) = \theta_1x\)
\(J(\theta_1) = \frac{1}{2m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i)^2\)
Goal: find \(\theta_1\) to minimize \(J(\theta_1)\)
例子:样本点包含(1, 1)、(2, 2)、(3, 3)的假设函数和代价函数的关系图
Have some function \(J(\theta_0, \theta_1, \theta_2, \ldots, \theta_n)\)
Want find \(\theta_0, \theta_1, \theta_2, \ldots, \theta_n\) to minimize \(J(\theta_0, \theta_1, \theta_2, \ldots, \theta_n)\)
Simplify -> \(\theta_1, \theta_2\)
repeat until convergence {
? \(\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\) (for \(j= 0\) and \(j = 1\))
}
变量含义:\(\alpha\): learning rate 学习速率(控制我们以多大的幅度更新这个参数\(\theta_j\) )
Correct: Simultaneous update 正确实现同时更新的方法
\(temp0 := \theta_0 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)
\(temp1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)
\(\theta_0 := temp0\)
\(\theta_1 := temp1\)
Incorrect: 没有实现同步更新
\(temp0 := \theta_0 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)
\(\theta_0 := temp0\)
\(temp1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)
\(\theta_1 := temp1\)
假设你将\(\theta_1\)初始化在局部最低点,而这条线的斜率将等于0,因此导数项等于0,梯度下降更新的过程中就会有\(\theta_1 = \theta_1\)。
Gradient descent can converge to a local minimum, even with the learning rate \(\alpha\) fixed. 即使学习速率\(\alpha\)固定不变,梯度下降也可以达到局部最小值。
As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease \(\alpha\) over time. 在我们接近局部最小值时,梯度下降将会自动更换为更小的步子,因此我们没必要随着时间的推移而更改\(\alpha\)的值。(因为斜率在变)
$\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) = \frac{\partial}{\partial\theta_j} \frac{1}{2m} \sum_{i = 1}^m(h_\theta(x^i) - y^i)^2 = \frac{\partial}{\partial\theta_j} \frac{1}{2m} \sum_{i = 0}^m(\theta_0 + \theta_1x^i - y^i)^2 $
repeat until convergence {
? $\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) $
? $\theta_1 := \theta_1 - \alpha \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) \times x^i $
}
"Batch": Each step of gradient descent uses all the training examples. 每迭代一步,都要用到训练集的所有数据。
机器学习(Machine Learning)- 吴恩达(Andrew Ng) 学习笔记(二)
原文:https://www.cnblogs.com/songjy11611/p/12173201.html