Video Understanding
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao*, Jiangtong Li*, Li Niu, Liqing Zhang.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)