Abstract
We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent ......
小提示:本篇文献需要登录阅读全文,点击跳转登录