Sarcasm Detection on Czech and English Twitter

Tomáš Ptáek, Ivan Habernal, Jun Hong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)
149 Downloads (Pure)

Abstract

This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).
Original languageEnglish
Title of host publicationProceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,
Pages213-223
Number of pages11
Publication statusPublished - Aug 2014
Event COLING 2014, the 25th International Conference on Computational Linguistics - Dublin, Ireland
Duration: 23 Aug 201429 Aug 2014

Conference

Conference COLING 2014, the 25th International Conference on Computational Linguistics
CountryIreland
CityDublin
Period23/08/201429/08/2014

Fingerprint

Learning systems
Classifiers
Experiments

Cite this

Ptáek, T., Habernal, I., & Hong, J. (2014). Sarcasm Detection on Czech and English Twitter. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, (pp. 213-223). [C14-1022]
Ptáek, Tomáš ; Habernal, Ivan ; Hong, Jun. / Sarcasm Detection on Czech and English Twitter. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,. 2014. pp. 213-223
@inproceedings{38f08db1c9654aaa97a9dcc2e70a1f2e,
title = "Sarcasm Detection on Czech and English Twitter",
abstract = "This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).",
author = "Tom{\'a}š Pt{\'a}ek and Ivan Habernal and Jun Hong",
year = "2014",
month = "8",
language = "English",
isbn = "9781941643266",
pages = "213--223",
booktitle = "Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,",

}

Ptáek, T, Habernal, I & Hong, J 2014, Sarcasm Detection on Czech and English Twitter. in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,., C14-1022, pp. 213-223, COLING 2014, the 25th International Conference on Computational Linguistics, Dublin, Ireland, 23/08/2014.

Sarcasm Detection on Czech and English Twitter. / Ptáek, Tomáš; Habernal, Ivan; Hong, Jun.

Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,. 2014. p. 213-223 C14-1022.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Sarcasm Detection on Czech and English Twitter

AU - Ptáek, Tomáš

AU - Habernal, Ivan

AU - Hong, Jun

PY - 2014/8

Y1 - 2014/8

N2 - This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).

AB - This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).

M3 - Conference contribution

SN - 9781941643266

SP - 213

EP - 223

BT - Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,

ER -

Ptáek T, Habernal I, Hong J. Sarcasm Detection on Czech and English Twitter. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers,. 2014. p. 213-223. C14-1022