Abstract
The dominant video captioning methods employ the attentional encoder-decoder architecture, where the decoder is an autoregressive structure that gener......
小提示:本篇文献需要登录阅读全文,点击跳转登录