Besides fast convergence, SGD has sometimes been observed to yield significantly better generalization errors than batch methods [1].
就是从 quadratic approximation 推出更新公式 ...
意思是结果还可以
然后做了一系列近似。。。
感觉文献综述很牛逼,Bottou 尤其
然后 arXiv 上的论文更全、结构更细...
原文:https://www.cnblogs.com/cx2016/p/13746800.html