Skip to main navigation Skip to search Skip to main content

BERT-based language identification in code-mix Kannada-English text at the CoLI-Kanglish shared task

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Downloads (Pure)

Abstract

Language identification has recently gained research interest in code-mixed languages due to the extensive use of social media among people. People who speak multiple languages tend to use code-mixed languages when communicating with each other. It has become necessary to identify the languages in such code-mixed environment to detect hate speeches, fake news, misinformation or disinformation and for tasks such as sentiment analysis. In this work, we have proposed a BERT-based approach for language identification in the CoLI-Kanglish shared task at ICON 2022. Our approach achieved 86% weighted average F-1 score and a macro average F-1 score of 57% in the test set.

Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
PublisherAssociation for Computational Linguistics
Pages12-17
Number of pages6
ISBN (Electronic)9781959429388
Publication statusPublished - Dec 2022
Event19th International Conference on Natural Language Processing (ICON 2022) - Delhi, India
Duration: 15 Dec 202218 Dec 2022

Conference

Conference19th International Conference on Natural Language Processing (ICON 2022)
Country/TerritoryIndia
CityDelhi
Period15/12/202218/12/2022

Fingerprint

Dive into the research topics of 'BERT-based language identification in code-mix Kannada-English text at the CoLI-Kanglish shared task'. Together they form a unique fingerprint.

Cite this