Terminology translation in low-resource scenarios

Rejwanul Haque*, Mohammed Hasanuzzaman, Andy Way

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
10 Downloads (Pure)

Abstract

Term translation quality in machine translation (MT), which is usually measured by domain experts, is a time-consuming and expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems often need to be updated for many reasons (e.g., availability of new training data, leading MT techniques). To the best of our knowledge, as of yet, there is no publicly-available solution to evaluate terminology translation in MT automatically. Hence, there is a genuine need to have a faster and less-expensive solution to this problem, which could help end-users to identify term translation problems in MT instantly. This study presents a faster and less expensive strategy for evaluating terminology translation in MT. High correlations of our evaluation results with human judgements demonstrate the effectiveness of the proposed solution. The paper also introduces a classification framework, TermCat, that can automatically classify term translation-related errors and expose specific problems in relation to terminology translation in MT.We carried out our experiments with a low resource language pair, English-Hindi, and found that our classifier, whose accuracy varies across the translation directions, error classes, the morphological nature of the languages, and MT models, generally performs competently in the terminology translation classification task.

Original languageEnglish
Article number273
Number of pages28
JournalInformation (Switzerland)
Volume10
Issue number9
Early online date30 Aug 2019
DOIs
Publication statusPublished - 01 Sept 2019
Externally publishedYes

Bibliographical note

Funding Information:
Funding: The ADAPT Centre for Digital Content Technology is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. This project has partially received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Grant Agreement No. 713567, and the publication has emanated from research supported in part by a research grant from SFI under Grant Number 13/RC/2077.

Funding Information:
The ADAPT Centre for Digital Content Technology is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. This project has partially received funding from the European Union's Horizon 2020 research and innovation programme under Marie Sklodowska-Curie Grant Agreement No. 713567, and the publication has emanated from research supported in part by a research grant from SFI under Grant Number 13/RC/2077

Publisher Copyright:
© 2019 by the authors.

Keywords

  • Machine translation
  • Neural machine translation
  • Phrase-based statistical machine translation
  • Terminology translation
  • Terminology translation evaluation

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Terminology translation in low-resource scenarios'. Together they form a unique fingerprint.

Cite this