Analysing quality metrics and automated scoring of code reviews

Owen Sortwell, David Cutting*, Christine McConnellogue

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Code reviews are an important part of the software development process, and there is a wide variety of approaches used to perform them. While it is generally agreed that code reviews are beneficial and result in higher quality software, little work has been done to investigate best practices and approaches explore which factors impact code review quality. Our approach firstly analyses current best practices and procedures for undertaking code reviews, along with the examination of metrics often used to analyse a review’s quality and current offerings for automated code review assessment. A maximum of one thousand code review comments per project were mined from GitHub pull requests across seven open source projects which have previously been analysed in similar studies. Several identified metrics are tested across these projects using Python's Natural Language Toolkit, including stop word ratio, overall sentiment, and detection of code snippets through the GitHub markdown language. Comparisons are drawn with regards to each project's culture and the language used in the code review process, with pros and cons for each. The results showed that the stop word ratio remained consistent across all projects with only one project exceeding an average of 30%, and that the percentage of positive comments across the projects was broadly similar also. The suitability of these metrics is also discussed with regards to the creation of a scoring framework and development of an automated code review analysis tool. It concludes that the software written is an effective method of comparing practices and cultures across projects and can benefit by promoting a positive review culture within an organisation. However, rudimentary sentiment analysis and detection of GitHub code snippets may not be sufficient to assess a code review's overall usefulness, as many terms that are important to include in a programmer's lexicon such as `error' and `fail' deem a code review to be negative. Code snippets that are included outside of the markdown language are also ignored from analysis. Recommendations for future work are suggested, including the development of a more robust sentiment analysis system that can include detection of emotion such as frustration, and the creation of a programming dictionary to exclude programming terms from sentiment analysis.
Original languageEnglish
Pages (from-to)514-533
JournalSoftware
Volume3
Issue number4
DOIs
Publication statusPublished - 29 Nov 2024

Keywords

  • Quality Metrics
  • Automated Scoring
  • Code Reviews
  • software development

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Software

Fingerprint

Dive into the research topics of 'Analysing quality metrics and automated scoring of code reviews'. Together they form a unique fingerprint.

Cite this