An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570

Gongde Guo, Hui Wang, David A. Bell, Yaxin Bi, Kieran Greer

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems.A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier.
Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science
Place of PublicationSwitzerland
PublisherSpringer
Pages559-570
Number of pages12
ISBN (Print)978-3-540-21006-1
Publication statusPublished - 2004

Fingerprint

Dive into the research topics of 'An kNN Model-Based Approach and Its Application in Text Categorization. CICLing 2004: 559-570'. Together they form a unique fingerprint.

Cite this