Structure-based supervised term weighting and regularization for text classification

Niloofer Shanavas*, Hui Wang, Zhiwei Lin, Glenn Hawe

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text documents have rich information that can be useful for different tasks. How to utilise the rich information in texts effectively and efficiently for tasks such as text classification is still an active research topic. One approach is to weight the terms in a text document based on their relevance to the classification task at hand. Another approach is to utilise structural information in a text document to regularize learning so that the learned model is more accurate. An important question is, can we combine the two approaches to achieve better performance? This paper presents a novel method for utilising the rich information in texts. We use supervised term weighting, which utilises the class information in a set of pre-classified training documents, thus the resulting term weighting is class specific. We also use structured regularization, which incorporates structural information into the learning process. A graph is built for each class from the pre-classified training documents and structural information in the graphs is used to calculate the supervised term weights and to define the groups for structured regularization. Experimental results for six text classification tasks show the increase in text classification accuracy with the utilisation of structural information in text for both weighting and regularization. Using graph-based text representation for supervised term weighting and structured regularization can build a compact model with considerable improvement in the performance of text classification.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 24th International Conference on Applications of Natural Language to Information Systems, NLDB 2019, Proceedings
EditorsElisabeth Métais, Farid Meziane, Sunil Vadera, Vijayan Sugumaran, Mohamad Saraee
PublisherSpringer Verlag
Pages105-117
Number of pages13
ISBN (Electronic)9783030232818
ISBN (Print)9783030232801
DOIs
Publication statusPublished - 21 Jun 2019
Externally publishedYes
Event24th International Conference on Application of Natural Language to Information Systems, NLDB 2019 - Salford, United Kingdom
Duration: 26 Jun 201928 Jun 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11608 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Application of Natural Language to Information Systems, NLDB 2019
Country/TerritoryUnited Kingdom
CitySalford
Period26/06/201928/06/2019

Bibliographical note

Publisher Copyright:
© 2019, Springer Nature Switzerland AG.

Keywords

  • Classification
  • Graph-based text representation
  • Node centrality
  • Structured regularization
  • Supervised term weighting
  • Text mining

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Structure-based supervised term weighting and regularization for text classification'. Together they form a unique fingerprint.

Cite this