Abstract
Microbes use a range of genetic codes and gene structures, yet these are often ignored during metagenomic analysis. This causes spurious protein predictions, preventing functional assignment which limits our understanding of ecosystems. To resolve this, we developed a lineage-specific gene prediction approach that uses the correct genetic code based on the taxonomic assignment of genetic fragments, removes incomplete protein predictions, and optimises prediction of small proteins. Applied to 9634 metagenomes and 3594 genomes from the human gut, this approach increased the landscape of captured expressed microbial proteins by 78.9%, including previously hidden functional groups. Optimised small protein prediction captured 3,772,658 small protein clusters, which form an improved microbial protein catalogue of the human gut (MiProGut). To enable the ecological study of a protein's prevalence and association with host parameters, we developed InvestiGUT, a tool which integrates both the protein sequences and sample metadata. Accurate prediction of proteins is critical to providing a functional understanding of microbiomes, enhancing our ability to study interactions between microbes and hosts.
| Original language | English |
|---|---|
| Article number | 3204 |
| Number of pages | 12 |
| Journal | Nature Communications |
| Volume | 16 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 01 Apr 2025 |
Keywords
- humans
- bacteria
- bacterial proteins
- Computational biology
- phylogeny
- metagenome
- metagenomics
- gastrointestinal microbiome
Fingerprint
Dive into the research topics of 'Lineage-specific microbial protein prediction enables large-scale exploration of protein ecology within the human gut'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver