Knowledge-driven graph similarity for text classification

Niloofer Shanavas*, Hui Wang, Zhiwei Lin, Glenn Hawe

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Downloads (Pure)

Abstract

Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.

Original languageEnglish
Number of pages15
JournalInternational Journal of Machine Learning and Cybernetics
Early online date19 Nov 2020
DOIs
Publication statusEarly online date - 19 Nov 2020

Bibliographical note

Funding Information:
The authors would like to acknowledge the support from Ulster University through the Vice Chancellor’s Research Scholarship (VCRS) Award.

Publisher Copyright:
© 2020, The Author(s).

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

  • Automatic text classification
  • Document similarity measure
  • Graph enrichment
  • Graph kernels
  • Graph-based text representation
  • Supervised term weighting
  • SVM

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Knowledge-driven graph similarity for text classification'. Together they form a unique fingerprint.

Cite this