Abstract
When applying a machine-learning approach to develop classifiers in a new domain, an important question is what measurements to take and how they will be used to construct informative features. This paper develops a novel set of machine-learning classifiers for the domain of classifying files taken from software projects; the target classifications are based on origin analysis. Our approach adapts the output of four copy-analysis tools, generating a number of different measurements. By combining the measures and the files on which they operate, a large set of features is generated in a semi-automatic manner. After which, standard attribute selection and classifier training techniques yield a pool of high quality classifiers (accuracy in the range of 90%), and information on the most relevant features.
Original language | English |
---|---|
Title of host publication | Res. and Dev. in Intelligent Syst. XXVII |
Subtitle of host publication | Incorporating Applications and Innovations in Intel. Sys. XVIII - AI 2010, 30th SGAI Int. Conf. on Innovative Techniques and Applications of Artificial Intel. |
Pages | 379-392 |
Number of pages | 14 |
DOIs | |
Publication status | Published - 01 Dec 2011 |
Externally published | Yes |
Event | 30th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, AI 2010 - Cambridge, United Kingdom Duration: 14 Dec 2010 → 16 Dec 2010 |
Conference
Conference | 30th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, AI 2010 |
---|---|
Country/Territory | United Kingdom |
City | Cambridge |
Period | 14/12/2010 → 16/12/2010 |
ASJC Scopus subject areas
- Artificial Intelligence
- Information Systems