TY - GEN
T1 - Classification decision combination for text categorization: An experimental study
AU - Bi, YX
AU - Bell, D
AU - Wang, Hui
AU - Guo, GD
AU - Dubitzky, Werner
N1 - 15th International Conference on Database and Expert Systems Applications (DEXA 2004), Zaragoza, SPAIN, AUG 30-SEP 03, 2004; DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS ; Conference date: 01-01-2004
PY - 2004
Y1 - 2004
N2 - This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.
AB - This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance - looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.
M3 - Conference contribution
T3 - LECTURE NOTES IN COMPUTER SCIENCE
SP - 222
EP - 231
BT - Unknown Host Publication
ER -