Identification, recovery, and refinement of hitherto undescribed population-level genomes from the human gastrointestinal tract

C. C. Laczny, E. E.L. Muller, A. Heintz-Buschart, M. Herold, L. A. Lebrun, A. Hogan, P. May, C. De Beaufort, P. Wilmes

Frontiers in Microbiology, 25 May 2016, doi: 10.3389/fmicb.2016.00884

Linking taxonomic identity and functional potential at the population-level is important for the study of mixed microbial communities and is greatly facilitated by the availability of microbial reference genomes.

While the culture-independent recovery of population-level genomes from environmental samples using the binning of metagenomic data has expanded available reference genome catalogues, several microbial lineages remain underrepresented.

Here we present two reference-independent approaches for the identification, recovery, and refinement of hitherto undescribed population-level genomes.
The first approach is aimed at genome recovery of varied taxa and involves multi-sample automated binning using Canopy Clustering complemented by visualisation and human-augmented binning using VizBin post hoc. The second approach is particularly well-suited for the study of specific taxa and employs VizBin de novo.

Using these approaches, we reconstructed a total of six population-level genomes of distinct and divergent representatives of the Alphaproteobacteria class, the Mollicutes class, the Clostridiales order, and the Melainabacteria class from human gastrointestinal tract-derived metagenomic data.

Our results demonstrate that, while automated binning approaches provide great potential for large-scale studies of mixed microbial communities, these approaches should be complemented with informative visualisations because expert-driven inspection and refinements are critical for the recovery of high-quality population-level genomes.