Abstract
The huge amount of text documents has made the manual organization of text data a tedious task. Automatic text classification helps to easily handle the large number of documents by organising them automatically into predefined classes. The effectiveness and efficiency of automatic text classification largely depends on the way text documents are represented. A text document is usually viewed as a bag of terms (or words) and represented as a vector using the vector space model where terms are assumed unordered and independent and term frequencies (or weights) are used in the representation. Graphs are another text representation scheme that considers the structure of terms in the text document which is important for natural language. Terms weighted on the basis of graph representation increase the performance of text classification. In this paper, we present a novel approach for graph-based supervised term weighting which considers information relevant for the classification task using node centrality in the co-occurrence graphs built from the labelled training documents. Our experimental evaluation of the proposed term weighting scheme on four benchmark datasets shows the scheme has consistently superior performance over the state-of-The-Art term weighting methods for text classification.
Original language | English |
---|---|
Title of host publication | Proceedings - 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 |
Editors | Carlotta Domeniconi, Francesco Gullo, Francesco Bonchi, Francesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Zhi-Hua Zhou, Xindong Wu |
Publisher | IEEE Computer Society |
Pages | 1261-1268 |
Number of pages | 8 |
ISBN (Electronic) | 9781509054725 |
DOIs | |
Publication status | Published - 02 Feb 2017 |
Externally published | Yes |
Event | 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 - Barcelona, Spain Duration: 12 Dec 2016 → 15 Dec 2016 |
Publication series
Name | IEEE International Conference on Data Mining Workshops, ICDMW |
---|---|
Volume | 0 |
ISSN (Print) | 2375-9232 |
ISSN (Electronic) | 2375-9259 |
Conference
Conference | 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 12/12/2016 → 15/12/2016 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
Keywords
- Automatic text classification
- Graph-based text representation
- Node centrality
- Supervised term weighting
ASJC Scopus subject areas
- Computer Science Applications
- Software