线性可分支持向量机学习的最优化问题:
\[\begin{array}{ll}
\min & \frac 1 2 w^Tw \\text{s.t.} & 1-y_i(w^T x_i+b) \le 0 , i=1,2,\dots,N \\end{array}
\]
构建拉格朗日函数:
\[L(w,b,\alpha) = \frac 1 2 w^Tw + \sum _{i=1}^n {\alpha_i[1-y_i(w^T x_i+b)]}
\]
根据拉格朗日对偶性,原始问题的对偶问题是极大极小问题
\[\max_\alpha \min_{w,b} L(w,b,\alpha)
\]
所以,为得到对偶问题的解,需要先求\(L(w,b,\alpha)\)对\(w,b\)的极小,再求对\(w\)的极大
\((1)\)求\(\min_{w,b} L(w,b,\alpha)\)
\[\nabla_w L(w,b,\alpha)=w-\sum_{i=1}^N\alpha_iy_ix_i=0 \\nabla_b L(w,b,\alpha)=-\sum_{i=1}^N\alpha_iy_i=0
\]
得
\[w=\sum_{i=1}^N\alpha_iy_ix_i \\sum_{i=1}^N\alpha_iy_i=0
\]
代入拉格朗日函数,得
\[\begin{equation}\begin{split}
L(w,b,\alpha) & = \frac 1 2 \sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_j(x_i^T x_j)-\sum_{i=1}^N\alpha_iy_i\left(\left(\sum_{j=1}^N\alpha_jy_jx_j\right)^Tx_i+b\right)+\sum_{i=1}^N\alpha_i \& = -\frac 1 2 \sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_j(x_i^T x_j)+\sum_{i=1}^N\alpha_i
\end{split}\end{equation}
\]
\((2)\)求\(min_{w,b}L(w,b,\alpha)\)对\(\alpha\)得极大,即是对偶问题
\[\begin{array}{ll}
\max_{\alpha} & -\frac 1 2 \sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_j(x_i^T x_j)+\sum_{i=1}^N\alpha_i \\text{s.t.} & \sum_{i=1}^N\alpha_iy_i=0 \ & \alpha_i \ge 0, i=1,2,...,N
\end{array}
\]
将上面的目标函数乘上\(-1\),由求极大转换成求极小,就得到下面与之等价得对偶最优化问题:
\[\begin{array}{ll}
\min_{\alpha} & \frac 1 2 \sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_j(x_i^T x_j)-\sum_{i=1}^N\alpha_i \\text{s.t.} & \sum_{i=1}^N\alpha_iy_i=0 \ & \alpha_i \ge 0, i=1,2,\dots,N
\end{array}
\]
求得最优解\(\alpha^*=(\alpha_1^*,\alpha_2^*,\dots,\alpha_N^*)^T\),这就是对偶问题的最优解
由对偶问题的最优解\(\alpha^*\)可以如下求出原问题最优解\((w^*,b^*)\):
一定存在下标\(j\),使\(\alpha_j^*\gt 0\),有
\[w^*=\sum_{i=1}^N\alpha_i^*y_ix_i \b^*=y_j-\sum_{i=1}^N\alpha_i^*y_i(x_i^Tx_j)
\]
证明:
设\(\alpha^*=(\alpha_1^*,\alpha_2^*,\dots,\alpha_N^*)^T\)是对偶最优化问题的解,\(w^*,b^*\)是原始最优化问题的解
由KKT条件成立,有
\[ \nabla_w L(w^*,b^*,\alpha^*)=w^*-\sum_{i=1}^N\alpha_i^*y_ix_i=0 \ \nabla_b L(w^*,b^*,\alpha^*)=-\sum_{i=1}^N\alpha_i^*y_i=0 \ \alpha_i^*(1-y_i(w^{*T}x_i+b^*))=0 \ 1-y_i(w^{*T}x_i+b^*)\le 0 \ \alpha_i\ge 0, i=1,2,\dots,N
\]
得
\[w^*=\sum_{i=1}^N\alpha_i^*y_ix_i
\]
其中至少有一个\(\alpha_j^*\gt 0\) (反证法:假设\(\alpha^*=0\),由上式,得\(w^*=0\),而\(w^*=0\)一定不是原问题得最优解,产生矛盾),对此\(j\)由于要满足
\[\alpha_i^*(1-y_i(w^{*T}x_i+b^*))=0
\]
故有
\[1-y_i(w^{*T}x_i+b^*)=0
\]
将\(w^*=\sum_{i=1}^N\alpha_i^*y_ix_i\)代入上式,并注意到\(y_j^2=1\),得
\[b^*=y_j-\sum_{i=1}^N\alpha_i^*y_i(x_i^Tx_j)
\]
证毕。
也就是说,只要求得(SMO算法)对偶问题的最优解\(\alpha^*\),就可以直接的到原始问题得最优解\(w^*,b^*\)
线性可分SVM转换为对偶问题,以及由对偶最优解求原始最优解(不含SMO)
原文:https://www.cnblogs.com/rising-sun/p/14290444.html