ReSCo-CC: unsupervised identification of key disinformation sentences

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Downloads (Pure)

Abstract

Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively.

Original languageEnglish
Title of host publicationProceedings of the 22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS2020)
EditorsMaria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Kotsis
PublisherAssociation for Computing Machinery
Pages47-54
ISBN (Print)9781450389242
DOIs
Publication statusPublished - 27 Jan 2021
Event22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020) - virtual, online
Duration: 30 Nov 202002 Dec 2020
https://www.iiwas.org/conferences/iiwas2020/

Conference

Conference22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020)
Cityvirtual, online
Period30/11/202002/12/2020
Internet address

Fingerprint

Dive into the research topics of 'ReSCo-CC: unsupervised identification of key disinformation sentences'. Together they form a unique fingerprint.

Cite this