Multimodal neural machine translation for low-resource language pairs using synthetic data

Koel Dutta Chowdhury, Mohammed Hasanuzzaman, Qun Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)
10 Downloads (Pure)

Abstract

In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data. A three-way parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-the-art MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.

Original languageEnglish
Title of host publicationProceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
PublisherAssociation for Computational Linguistics
Pages33-42
Number of pages10
ISBN (Electronic)9781948087476
DOIs
Publication statusPublished - 19 Jul 2018
Externally publishedYes
EventACL Workshop on Deep Learning Approaches for Low-Resource NLP 2018 - Melbourne, Australia
Duration: 19 Jul 201819 Jul 2018

Publication series

NameACL Proceedings
ISSN (Print)0736-587X

Conference

ConferenceACL Workshop on Deep Learning Approaches for Low-Resource NLP 2018
Abbreviated titleACL DeepLo Workshop 2018
Country/TerritoryAustralia
CityMelbourne
Period19/07/201819/07/2018

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Multimodal neural machine translation for low-resource language pairs using synthetic data'. Together they form a unique fingerprint.

Cite this