Abstract
This paper proposes a framework which induces semantically rich concepts from probabilistically generated topics by a topic modeling algorithm. In this method an off-the-shelf tool has been used to extract noun-phrases as word bi-grams and tri-grams from the static document corpus and then models the topics using Latent Dirichlet Allocation algorithm. Additionally, we show that a small extension to our proposed framework can better rank documents in a large collection, which is a well studied area in information retrieval. Experiments conducted on three real world datasets show that this proposed framework outperforms state-of-the-art methods used for extracting concepts and ranking documents. When compared with the baselines chosen, our proposed concept extraction method showed an increased f-measure in the range of 16.65% to 22.04% and the proposed topic modeling guided document retrieval method showed 7.6%–16.61% increase in f-measure.
Original language | English |
---|---|
Title of host publication | Proceedings of the Third International Symposium on Intelligent Systems Technologies and Applications (ISTA’17) |
Editors | Sabu M. Thampi, Alex Pappachen James, Stefano Berretti, Jayanta Mukhopadhyay, Sushmita Mitra, Kuan-Ching Li |
Publisher | Springer Verlag |
Pages | 123-135 |
Number of pages | 13 |
ISBN (Electronic) | 9783319683850 |
ISBN (Print) | 9783319683843 |
DOIs | |
Publication status | Published - 20 Oct 2017 |
Event | 3rd International Symposium on Intelligent Systems Technologies and Applications, ISTA’17 - Udupi, India Duration: 13 Sept 2017 → 16 Sept 2017 |
Publication series
Name | Advances in Intelligent Systems and Computing |
---|---|
Volume | 683 |
ISSN (Print) | 2194-5357 |
ISSN (Electronic) | 2194-5365 |
Conference
Conference | 3rd International Symposium on Intelligent Systems Technologies and Applications, ISTA’17 |
---|---|
Country/Territory | India |
City | Udupi |
Period | 13/09/2017 → 16/09/2017 |
Bibliographical note
Publisher Copyright:© Springer International Publishing AG 2018.
Keywords
- Concept extraction
- Document ranking
- Latent dirichlet allocation
- Topic modeling
ASJC Scopus subject areas
- Control and Systems Engineering
- General Computer Science