Abstract
General-purpose emotion lexicons (GPELs) that associate words with emotion categories remain a valuable resource for emotion detection. However, the static and formal nature of their vocabularies make them an inadequate resource for detecting emotions in domains that are inherently dynamic in nature. This calls for lexicons that are not only adaptive to the lexical variations in a domain but which also provide finer-grained quantitative estimates to accurately capture word-emotion associations. In this article, the authors demonstrate how to harness labeled emotion text (such as blogs and news headlines) and weakly labeled emotion text (such as tweets) to learn a word-emotion association lexicon by jointly modeling emotionality and neutrality of words using a generative unigram mixture model (UMM). Empirical evaluation confirms that UMM generated emotion language models (topics) have significantly lower perplexity compared to those from state-of-the-art generative models like supervised Latent Dirichlet Allocation (sLDA). Further emotion detection tasks involving word-emotion classification and document-emotion ranking confirm that the UMM lexicon significantly out performs GPELs and also state-of-the-art domain specific lexicons.
Original language | English |
---|---|
Pages (from-to) | 102-108 |
Number of pages | 7 |
Journal | IEEE Intelligent Systems |
Volume | 32 |
Issue number | 1 |
DOIs | |
Publication status | Published - 13 Feb 2017 |