Positional mask attention for video sequence modeling

Jiaxuan Wang, Chaoyi Wang, Yang Hua, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The attention mechanism has been widely developed in different domains. Some recent studies apply position embedding to encode relative positions in the attention mechanism for learning better representations in both natural language processing and computer vision tasks. However, this position embedding method is limited to the 'fixed input size' problem and requires large additional memory to store the position embedding parameters. In this paper, we present the positional mask attention, which is a new approach to incorporate position information into the attention mechanism. Specifically, a positional distance mask is proposed to encode the relative positions as a type of prior knowledge, which is different from the existing position embedding methods. To verify the generality and effectiveness of the proposed method, we evaluate our positional mask attention on two general video understanding tasks, i.e., video object detection and video instance segmentation. Experimental results demonstrate that our method can achieve significant improvement by aggregating the position information.

Original languageEnglish
Title of host publication2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI): Proceedings
EditorsQingli Li, Lipo Wang, Yan Wang, Wenwu Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781665400039
ISBN (Print)9781665400053
DOIs
Publication statusPublished - 07 Dec 2021
Externally publishedYes
Event14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2021 - Shanghai, China
Duration: 23 Oct 202125 Oct 2021

Publication series

NameProceedings - International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI

Conference

Conference14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2021
Country/TerritoryChina
CityShanghai
Period23/10/202125/10/2021

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Signal Processing
  • Information Systems and Management
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'Positional mask attention for video sequence modeling'. Together they form a unique fingerprint.

Cite this