Using n-grams to rapidly characterise the evolution of software code

Austen Rainer*, Peter C.R. Lane, James A. Malcolm, Sven Bodo Scholz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses n-grams of 3 tokens, to characterise the evolution of software code. Ferret's implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret's output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.

Original languageEnglish
Title of host publicationAramis 2008 - 1st International Workshop on Automated engineeRing of Autonomous and runtiMe evolvIng Systems, and ASE2008 the 23rd IEEE/ACM Int. Conf. Automated Software Engineering
Pages43-52
Number of pages10
DOIs
Publication statusPublished - 01 Dec 2008
EventAramis 2008 - 1st International Workshop on Automated engineeRing of Autonomous and runtiMe evolvIng Systems In conjunction with ASE2008 the 23rd IEEE/ACM International Conference on Automated Software Engineering - L'Aquila, Italy
Duration: 16 Sep 200816 Sep 2008

Conference

ConferenceAramis 2008 - 1st International Workshop on Automated engineeRing of Autonomous and runtiMe evolvIng Systems In conjunction with ASE2008 the 23rd IEEE/ACM International Conference on Automated Software Engineering
CountryItaly
CityL'Aquila
Period16/09/200816/09/2008

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Rainer, A., Lane, P. C. R., Malcolm, J. A., & Scholz, S. B. (2008). Using n-grams to rapidly characterise the evolution of software code. In Aramis 2008 - 1st International Workshop on Automated engineeRing of Autonomous and runtiMe evolvIng Systems, and ASE2008 the 23rd IEEE/ACM Int. Conf. Automated Software Engineering (pp. 43-52). [4686320] https://doi.org/10.1109/ASEW.2008.4686320