Efficient one-stage video object detection by exploiting temporal consistency

Guanxiong Sun*, Yang Hua, Guosheng Hu, Neil Robertson

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for VOD. Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. Specifically, our method consists of a location prior network to filter out background regions and a size prior network to skip unnecessary computations on low-level feature maps for specific frames. We test our method on various modern one-stage detectors and conduct extensive experiments on the ImageNet VID dataset. Excellent experimental results demonstrate the superior effectiveness, efficiency, and compatibility of our method. The code is available at https://github.com/guanxiongsun/EOVOD .

Original languageEnglish
Title of host publicationProceeding of the 17th European Conference on Computer Vision
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
PublisherSpringer Nature Switzerland AG
Pages1-16
VolumeXXXV
ISBN (Electronic)9783031198335
ISBN (Print)9783031198328
DOIs
Publication statusPublished - 04 Nov 2022
EventEuropean Conference on Computer Vision - Israel, Tel-Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume13695
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision
Abbreviated titleECCV 2022
Country/TerritoryIsrael
CityTel-Aviv
Period23/10/202227/10/2022

Fingerprint

Dive into the research topics of 'Efficient one-stage video object detection by exploiting temporal consistency'. Together they form a unique fingerprint.

Cite this