SoftWhere? Searching for research software in the UK

Research output: Contribution to conferencePaperpeer-review

Abstract

REF, the UK’s Research Excellence Framework, periodically assesses the quality and impact of higher education research in the UK. There has been a steep decline in submissions of research software to REF from 2008 to 2021, despite the rapid growth of the RSE discipline in the same time period. To date, there has been no investigation into how the official academic Institutional Repositories (IRs) have impacted these low return rates. In what we believe to be the first such census of its kind, we queried 180 online repositories of over 150 UK universities. We found that the prevalence of software records within UK IRs is worryingly low, while significantly few contain software as recognised academic output. Of greater concern, we found that a large majority of repositories simply cannot record software as a distinct type of research output, despite using controlled metadata formats and vocabularies that include software as an entity. Several Universities appeared to have even removed software as a defined type from the default settings of their repository, indicating institutional policy being an underlying issue. Indeed, for the most popular repository platform, a single word in a config file would permit a software type. We also explored potential correlational variables, such as having an RSE team listed at the institution, but failed to find correlations between these metadata and the prevalence of records of software. This then begs the question, where is all the research software in the UK? In the second part of this research, we sought to establish where research software from UK Academic Institutions is kept/recorded/registered. We compared records of software in institutional repositories to the recorded outputs in the UK’s Gateway to Research, a publicly accessible database of all outputs claimed from government-funded research. We found the latter contained five times the amount of software outputs of the former. Of these 7232 software outputs, only 71% had a URL linking to the software, while fewer than 4000 of these worked. Categorizing these URLs, we found the single largest source category was commercial code repositories. The overwhelming majority of software was found in GitHub, which stores the software of around 1/3 of working disclosed URLs from publicly funded research. Finally, we discuss the implications of these findings with regard to the lack of recognition of software as a discrete research output in institutions, despite the opposite being mandated by funders, and we make recommendations for changes in policies and operating procedures.

Original languageEnglish
Publication statusAccepted - 06 Oct 2023
EventSoftware Engineering for Research Software Engineering 2023 - University of Chicago, Chicago, United States
Duration: 16 Oct 202318 Oct 2023
https://se4science.org/workshops/se4rse23/index.htm

Workshop

WorkshopSoftware Engineering for Research Software Engineering 2023
Abbreviated titleSE4RSE'23
Country/TerritoryUnited States
CityChicago
Period16/10/202318/10/2023
Internet address

Fingerprint

Dive into the research topics of 'SoftWhere? Searching for research software in the UK'. Together they form a unique fingerprint.

Cite this