SC-RANK: Improving Convolutional Image Captioning with Self-Critical Learning and Ranking Metric-based Reward

Shiyang Yan, Yang Hua, Neil Robertson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Downloads (Pure)

Abstract

Image captioning usually employs a Recurrent Neural Network (RNN) to decode the image features from a Convolutional Neural Network (CNN) into a sentence. This RNN model is trained under Maximum Likelihood Estimation (MLE). However, inherent issues like the complex memorising mechanism of the RNNs and the exposure bias introduced by MLE exist in this approach. Recently, the convolutional captioning model shows advantages with a simpler architecture and a parallel training capability. Nevertheless, the MLE training brings the exposure bias which still prevents the model from achieving better performance. In this paper, we prove that the self-critical algorithm can optimise the CNN-based model to alleviate this problem. A ranking metric-based reward, denoted as SC-RANK, is proposed with the sentence embeddings from a pre-trained language model to generate more diversified captions. Applying SC-RANK can avoid the tedious tuning of the specially-designed language model and the knowledge transferred from a pre-trained language model proves to be helpful for image captioning tasks. State-of-the-art results have been obtained in the MSCOCO dataset by proposed SC-RANK.
Original languageEnglish
Title of host publicationProceedings of the British Machine Vision Conference 2019
PublisherSpringer
Number of pages14
Publication statusAccepted - 01 Jul 2019

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
ISSN (Print)1865-0929

Fingerprint Dive into the research topics of 'SC-RANK: Improving Convolutional Image Captioning with Self-Critical Learning and Ranking Metric-based Reward'. Together they form a unique fingerprint.

Cite this