A Formalism for Relevance and Its Application in Feature Subset Selection

David A. Bell, Hui Wang

Research output: Contribution to journalArticlepeer-review

141 Citations (Scopus)

Abstract

The notion of relevance is used in many technical fields. In the areas of machine learning and data mining, for example, relevance is frequently used as a measure in feature subset selection (FSS). In previous studies, the interpretation of relevance has varied and its connection to FSS has been loose. In this paper a rigorous mathematical formalism is proposed for relevance, which is quantitative and normalized. To apply the formalism in FSS, a characterization is proposed for FSS: preservation of learning information and minimization of joint entropy. Based on the characterization, a tight connection between relevance and FSS is established: maximizing the relevance of features to the decision attribute, and the relevance of the decision attribute to the features. This connection is then used to design an algorithm for FSS. The algorithm is linear in the number of instances and quadratic in the number of features. The algorithm is evaluated using 23 public datasets, resulting in an improvement in prediction accuracy on 16 datasets, and a loss in accuracy on only 1 dataset. This provides evidence that both the formalism and its connection to FSS are sound.
Original languageEnglish
Pages (from-to)179-195
Number of pages17
JournalMachine Learning
Volume41
DOIs
Publication statusPublished - 01 Nov 2000
Externally publishedYes

Fingerprint

Dive into the research topics of 'A Formalism for Relevance and Its Application in Feature Subset Selection'. Together they form a unique fingerprint.

Cite this