Evaluating the quality and quantity of data on open source software projects

Austen Rainer*, Stephen Gale

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

28 Citations (Scopus)

Abstract

In this paper, we provide a preliminary evaluation of the quality and quantity of data on open source (OS) projects, provided at the SourceForge.net portal. We have derived a dataset of approximately 50000 projects from SourceForge. Using several indicators of project activity, we identify two samples from the entire dataset: the 'most active' OS projects (a total of 456 projects, ~0.9% of the entire dataset), and those projects with active code development (5826 projects, ~11.6%). The number of projects that are active across all of our main indicators of activity account for less than 1% of the projects on the portal. This suggests that many OS projects being registered on SourceForge are 'impulse' projects, which do not gather sufficient interest from developers or users to 'activate' those projects and make them 'successful'. It also suggests that researchers, developers and users should be careful about how they use OS portals.

Original languageEnglish
Pages29-36
Number of pages8
Publication statusPublished - 01 Dec 2005
Event1st International Conference on Open Source Systems, OSS 2005 - Genova, Italy
Duration: 11 Jul 200515 Jul 2005

Conference

Conference1st International Conference on Open Source Systems, OSS 2005
Country/TerritoryItaly
CityGenova
Period11/07/200515/07/2005

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Evaluating the quality and quantity of data on open source software projects'. Together they form a unique fingerprint.

Cite this