Topic modeling for unsupervised concept extraction and document ranking

V. S. Anoop*, S. Asharaf, P. Deepak

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper proposes a framework which induces semantically rich concepts from probabilistically generated topics by a topic modeling algorithm. In this method an off-the-shelf tool has been used to extract noun-phrases as word bi-grams and tri-grams from the static document corpus and then models the topics using Latent Dirichlet Allocation algorithm. Additionally, we show that a small extension to our proposed framework can better rank documents in a large collection, which is a well studied area in information retrieval. Experiments conducted on three real world datasets show that this proposed framework outperforms state-of-the-art methods used for extracting concepts and ranking documents. When compared with the baselines chosen, our proposed concept extraction method showed an increased f-measure in the range of 16.65% to 22.04% and the proposed topic modeling guided document retrieval method showed 7.6%–16.61% increase in f-measure.

Original languageEnglish
Title of host publicationProceedings of the Third International Symposium on Intelligent Systems Technologies and Applications (ISTA’17)
EditorsSabu M. Thampi, Alex Pappachen James, Stefano Berretti, Jayanta Mukhopadhyay, Sushmita Mitra, Kuan-Ching Li
PublisherSpringer Verlag
Pages123-135
Number of pages13
ISBN (Electronic)9783319683850
ISBN (Print)9783319683843
DOIs
Publication statusPublished - 20 Oct 2017
Event3rd International Symposium on Intelligent Systems Technologies and Applications, ISTA’17 - Udupi, India
Duration: 13 Sept 201716 Sept 2017

Publication series

NameAdvances in Intelligent Systems and Computing
Volume683
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

Conference3rd International Symposium on Intelligent Systems Technologies and Applications, ISTA’17
Country/TerritoryIndia
CityUdupi
Period13/09/201716/09/2017

Bibliographical note

Publisher Copyright:
© Springer International Publishing AG 2018.

Keywords

  • Concept extraction
  • Document ranking
  • Latent dirichlet allocation
  • Topic modeling

ASJC Scopus subject areas

  • Control and Systems Engineering
  • General Computer Science

Fingerprint

Dive into the research topics of 'Topic modeling for unsupervised concept extraction and document ranking'. Together they form a unique fingerprint.

Cite this