Abstract
We consider generation and comprehension of natural language referring expression for objects in an image. Unlike generic "image captioning" which lac......
小提示:本篇文献需要登录阅读全文,点击跳转登录