StORF-Reporter: finding genes between genes

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
84 Downloads (Pure)

Abstract

Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
Original languageEnglish
Pages (from-to)11504-11517
Number of pages14
JournalNucleic Acids Research
Volume51
Issue number21
Early online date28 Oct 2023
DOIs
Publication statusPublished - 27 Nov 2023

Bibliographical note

Funding Information:
N.J.D. was funded by an IBERS Aberystwyth University PhD fellowship and was supported by Farncombe Digestive Health Disease Institute (McMaster University); Weston Family Microbiome Initiative; C.J.C. wishes to acknowledge funding from the BBSRC [BB/E/W/10964A01, BBS/OS/GC/000011B]; DAFM Ireland/DAERA Northern Ireland [Meth-Abate, R3192GFS]; EU via Horizon 2020 [818368, MASTER and 101000213, Holoruminant]. Funding for open access charge: Read and Publish agreement.

Funding Information:
Funding N.J.D. was funded by an IBERS Aberystwyth University PhD fellowship and was supported by Farncombe Diges- tive Health Disease Institute (McMaster University); We- ston Family Microbiome Initiative; C.J.C. wishes to ac- knowledge funding from the BBSRC [BB / E / W / 10964A01, BBS / OS / GC / 000011B]; D AFM Ireland / D AERA Northern Ireland [Meth-Abate, R3192GFS]; EU via Horizon 2020 [818368, MASTER and 101000213, Holoruminant]. Fund- ing for open access charge: Read and Publish agreement.

Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of Nucleic Acids Research.

Keywords

  • Genetics

ASJC Scopus subject areas

  • Genetics

Fingerprint

Dive into the research topics of 'StORF-Reporter: finding genes between genes'. Together they form a unique fingerprint.

Cite this