Abstract
Visual grounding is a vision and language understanding task aiming at locating a region in an image according to a specific query phrase. However, mo......
小提示:本篇文献需要登录阅读全文,点击跳转登录