Abstract
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is ......
小提示:本篇文献需要登录阅读全文,点击跳转登录