Object Guided External Memory Network for Video Object Detection

Hanming Deng, Yang Hua, Tao Song, Zongpu Zhang, Zhengui Xue, Ruhui Ma, Neil Robertson, Haibing Guan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)
209 Downloads (Pure)


Video object detection is more challenging than image object detection because of the deteriorated frame quality. To enhance the feature representation, state-of-the-art methods propagate temporal information into the deteriorated frame by aligning and aggregating entire feature maps from multiple nearby frames. However, restricted by feature map's low storage-efficiency and vulnerable content-address allocation, long-term temporal information is not fully stressed by these methods. In this work, we propose the first object guided external memory network for online video object detection. Storage-efficiency is handled by object guided hard-attention to selectively store valuable features, and long-term information is protected when stored in addressable external data matrix. A set of read/write operations are designed to accurately propagate/allocate and delete multi-level memory feature under object guidance. We evaluate our method on the ImageNet VID dataset and achieve state-of-the-art performance as well as good speed-accuracy tradeoff. Furthermore, by visualizing the external memory, we show the detailed object-level reasoning process across frames.
Original languageEnglish
Title of host publication2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Publisher IEEE
Publication statusPublished - 27 Feb 2020

Publication series

ISSN (Print)1550-5499
ISSN (Electronic)2380-7504


Dive into the research topics of 'Object Guided External Memory Network for Video Object Detection'. Together they form a unique fingerprint.

Cite this