AbstractTraditional approaches to Sentiment Analysis on Twitter have largely focused on identifying the sentiment polarity and intensity at the tweet level. One problem with the traditional type of binary classification, is that the sentiment output is usually in the form of ‘1’ (positive) or ‘0’ (negative) for the string of text in the tweet, regardless if there are one or more entities referred to in the text. In scenarios where one tweet can refer to multiple entities, a more fine-grained approach is needed in order to differentiate the sentiment that is associated with the individual entities. With this in mind, the key aim of this research is to investigate how entities and their descriptor words, for example, adjectives, verbs or adverbs can be used to identify the sentiment of the tweet in relation to the entity or entities, where more than one entity exists.
This task has been approached through a hybrid approach which uses the popular sentiment lexicon – SentiWordNet 3.0 to score related descriptor words, that are within 2-word spaces of an entity, for tweets that contains more than one entity. SentiWordNet has been chosen as the sentiment lexicon of choice as it has been shown to perform better than other lexicon dictionaries (Taboada, et al., 2011). The remaining tweets (that contain one entity only) are scored using the word-embedding method known as Word2Vec.
This research considers the usage of word embeddings and a sentiment lexicon hybrid approach, in order to address this task. The findings from this body of work demonstrate that, by integrating a word embeddings approach for single entity tweets, accompanied by a sentiment lexicon approach for multi-entity tweets, this has improved the accuracy of sentiment scoring on Twitter texts, from the lexicon-based baseline.
|Date of Award||Jul 2019|
|Supervisor||Deepak Padmanabhan (Supervisor) & Paul Miller (Supervisor)|