Computational haplotype recovery and long-read validation identifies novel isoforms of industrially relevant enzymes from natural microbial communities

Samuel M Nicholls, Wayne Aubrey, Arwyn Edwards, Kurt de Grave, Sharon Huws, Leander Schietgat, André Soares, Christopher J Creevey, Amanda Clare

Research output: Other contribution

Abstract

Population-level diversity of natural microbiomes represent a biotechnological resource for biomining, biorefining and synthetic biology but requires the recovery of the exact DNA sequence (or "haplotype") of the genes and genomes of every individual present. Computational haplotype reconstruction is extremely difficult, complicated by environmental sequencing data (metagenomics). Current approaches cannot choose between alternative haplotype reconstructions and fail to provide biological evidence of correct predictions. To overcome this, we present Hansel and Gretel: a novel probabilistic framework that reconstructs the most likely haplotypes from complex microbiomes, is robust to sequencing error and uses all available evidence from aligned reads, without altering or discarding observed variation. We provide the first formalisation of this problem and propose "metahaplome" as a definition for the set of haplotypes for any genomic region of interest within a metagenomic dataset. Finally, we demonstrate using long-read sequencing, biological evidence of novel haplotypes of industrially important enzymes computationally predicted from a natural microbiome.
Original languageEnglish
TypeOnline paper
Media of outputbioRxiv preprint server
Number of pages1
DOIs
Publication statusPublished - 13 Jan 2018

Publication series

NamebioRxiv

Fingerprint Dive into the research topics of 'Computational haplotype recovery and long-read validation identifies novel isoforms of industrially relevant enzymes from natural microbial communities'. Together they form a unique fingerprint.

Cite this