Co-abundance of Gene Groups and Metagenome Species/Strain with a consistent definition and efficient algorithm
De Filippis et al48 identified that within the genera Prevotella, the presence of distinct oligotypes had differential associations with non-vegetarian and vegetarian diets48 . In the present study, the representative sequence of OTUs assigned to Prevotella were mapped on to the known oligotypes to check the species association. About 90% of the Prevotella OTU sequences identified were P11 and P12 oligotype representative of non-vegetarian type, indicating their prevalence in North Indian population (Supplementary Table ST1)
To quantify species and strain level genomic variation accurately and broadly at four different heights, co-abundance of genes (CAG) were identified by binning of SOAP- a denovo assembly of all Shotgun Metagenome samples. A total 57 bins were obtained from Metabat2 software. 12 bins which had greater than 150 genes, were accurately selected for further processing to avoid erroneous, inconsistent and incomplete annotations that would affect some taxonomies. Genes which were differentially abundant and had p <0.05 were considered for visualization (Supplementary Figure S5).
We used an integrated pipeline for profiling both species and stain level abundance and genomic variations, from metagenomes. MIDAS analysis pipeline generated few more bacteria in addition to results obtained from Metabat2. MIDAS was able to capture the majority of microbial species abundance across the subjects, making it well suited for uncovering strain-level variation associated with various heights. For species with maximum coverage, reads were aligned to pan-genome database of genes to estimate gene coverage, copy number and presence or absence and finally detected SNPs. The pangenome reconstructed bacterial profile revealed at all time points, was filtered for minimum pangenome coverage of 70 % demonstrating 37 species and strains (Supplementary Figure S6). The relative abundance profile generated was analysed through T-test to identify highly significant taxa with an FDR cut off of 0.05. There was a significant correlation between Metabat2 and MIDAS results which justifies the presence of Roseburia, Prevotella, Faecalibacterium, Eubacterium & Bacteroides, significantly enriched out of 37 species by both the methods of analysis.