Efficient one-stage video object detection by exploiting temporal consistency

Guanxiong Sun*, Yang Hua, Guosheng Hu, Neil Robertson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)


Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for VOD. Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. Specifically, our method consists of a location prior network to filter out background regions and a size prior network to skip unnecessary computations on low-level feature maps for specific frames. We test our method on various modern one-stage detectors and conduct extensive experiments on the ImageNet VID dataset. Excellent experimental results demonstrate the superior effectiveness, efficiency, and compatibility of our method. The code is available at https://github.com/guanxiongsun/EOVOD .

Original languageEnglish
Title of host publicationProceeding of the 17th European Conference on Computer Vision
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
PublisherSpringer Nature Switzerland AG
ISBN (Electronic)9783031198335
ISBN (Print)9783031198328
Publication statusPublished - 04 Nov 2022
EventEuropean Conference on Computer Vision - Israel, Tel-Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision
Abbreviated titleECCV 2022


Dive into the research topics of 'Efficient one-stage video object detection by exploiting temporal consistency'. Together they form a unique fingerprint.

Cite this