Abstract
State-of-the-art video object detection methods maintain a memory structure, either a sliding window or a memory queue, to enhance the current frame using attention mechanisms. However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information. In this paper, we propose a multi-level aggregation architecture via memory bank called MAMBA. Specifically, our memory bank employs two novel operations to eliminate disadvantages of existing methods: (1) light-weight key-set construction which can significantly reduce the computational cost; (2) fine-grained feature-wise updating strategy which enables our method to utilize knowledge from the whole video. To better enhance features from complementary levels, i.e., feature maps and proposals, we further propose a generalized enhancement operation (GEO) to aggregate multi-level features in a unified manner. We conduct extensive evaluations on the challenging ImageNetVID dataset. Compared with existing state-of-the-art methods, our method achieves superior performance in terms of both speed and accuracy. More remarkably, MAMBA achieves mAP of 83.7%/84.6% at 12.6/9.1 FPS with ResNet-101.
Original language | English |
---|---|
Title of host publication | 35th AAAI Conference on Artificial Intelligence: Proceedings |
Pages | 2620-2627 |
Volume | 35 |
Edition | 3;AAAI-21 Technical Tracks 3 |
Publication status | Published - 18 May 2021 |
Publication series
Name | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Publisher | AAAI Press |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Fingerprint
Dive into the research topics of 'MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection'. Together they form a unique fingerprint.Student theses
-
Towards effective and efficient video object detection
Sun, G. (Author), Hua, Y. (Supervisor), Wang, H. (Supervisor) & Robertson, N. (Supervisor), Jul 2023Student thesis: Doctoral Thesis › Doctor of Philosophy
File