TY - GEN
T1 - Multimodal neural machine translation for low-resource language pairs using synthetic data
AU - Chowdhury, Koel Dutta
AU - Hasanuzzaman, Mohammed
AU - Liu, Qun
PY - 2018/7/19
Y1 - 2018/7/19
N2 - In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data. A three-way parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-the-art MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.
AB - In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data. A three-way parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-the-art MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.
U2 - 10.18653/v1/W18-3405
DO - 10.18653/v1/W18-3405
M3 - Conference contribution
AN - SCOPUS:85077627118
T3 - ACL Proceedings
SP - 33
EP - 42
BT - Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
PB - Association for Computational Linguistics
T2 - ACL Workshop on Deep Learning Approaches for Low-Resource NLP 2018
Y2 - 19 July 2018 through 19 July 2018
ER -