Abstract
Language identification has recently gained research interest in code-mixed languages due to the extensive use of social media among people. People who speak multiple languages tend to use code-mixed languages when communicating with each other. It has become necessary to identify the languages in such code-mixed environment to detect hate speeches, fake news, misinformation or disinformation and for tasks such as sentiment analysis. In this work, we have proposed a BERT-based approach for language identification in the CoLI-Kanglish shared task at ICON 2022. Our approach achieved 86% weighted average F-1 score and a macro average F-1 score of 57% in the test set.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts |
| Publisher | Association for Computational Linguistics |
| Pages | 12-17 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781959429388 |
| Publication status | Published - Dec 2022 |
| Event | 19th International Conference on Natural Language Processing (ICON 2022) - Delhi, India Duration: 15 Dec 2022 → 18 Dec 2022 |
Conference
| Conference | 19th International Conference on Natural Language Processing (ICON 2022) |
|---|---|
| Country/Territory | India |
| City | Delhi |
| Period | 15/12/2022 → 18/12/2022 |
Fingerprint
Dive into the research topics of 'BERT-based language identification in code-mix Kannada-English text at the CoLI-Kanglish shared task'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver