Ontology-based enriched concept graphs for medical document classification

Niloofer Shanavas*, Hui Wang, Zhiwei Lin, Glenn Hawe

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


The rapidly increasing volume of medical text data, including biomedical literature and clinical records, presents difficulties to biomedical researchers and clinical practitioners. Automatic text classification is an important means for managing medical text data. The main challenge in medical text classification is the complex terminology used in these documents. Therefore, it is critical to handle synonymy, polysemy, and multi-word concepts so that classification is based on the meaning of these documents. The solution to this problem of complex terminology helps in building systems with better access to relevant data, resulting in more effective utilisation of the existing information. In this paper, we present a simple and effective approach to address this challenge. A concept graph is automatically constructed and enriched for each medical text document with the help of a domain-specific similarity matrix that is built using Unified Medical Language System (UMLS) concepts in the training documents. Medical text documents are compared based on their enriched concept graphs using a graph kernel. Classification is then done based on the comparison result. The benefit of this approach is that it allows the incorporation of domain knowledge into the classification frame-work. The experiments on biomedical abstracts and clinical reports classification show the effectiveness of the proposed approach. Based on evaluation metrics of precision, recall and F1-scores, our method achieves a significantly higher classification performance than other widely used similarity measures for similarity-based text classification.
Original languageEnglish
Pages (from-to)172-181
Number of pages10
JournalInformation Sciences
Early online date14 Mar 2020
Publication statusPublished - Jul 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2020

Copyright 2020 Elsevier B.V., All rights reserved.


  • Graph kernel
  • Graph-based text representation
  • Medical text classification
  • Similarity measure
  • SVM
  • UMLS

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence


Dive into the research topics of 'Ontology-based enriched concept graphs for medical document classification'. Together they form a unique fingerprint.

Cite this