Fine-grained Audible Video Description

Shen, XY; Li, D; Zhou, JX; Qin, Z; He, BW; Han, XD; Li, AX; Dai, YC; Kong, LP; Wang, M; Qiao, Y; Zhong, YR

Zhong, YR (通讯作者),Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023; (): 10585

Abstract

We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD). It aims to provide detailed textual des......

Full Text Link