2.3 | Construction of the gene catalogue
Shotgun sequencing reads for fecal samples from R.roxellana , R. bieti and R. strykeri individuals
were independently processed. We filtered the raw data using Trimmomatic
(v0.36) (Bolger, Lohse, & Usadel, 2014) to exclude adapter sequences
and low-quality reads. Then we used the genomes of R. strykeri(assembly ASM2376470v1), R. bieti (assembly ASM169854v2),R. roxellana (assembly ASM756505v1) and human (assembly
GRCh38.p13) to remove contaminants and obtain clean data with Bowtie2
(v2.3.5) (Langmead & Salzberg, 2012). The remaining reads were
considered high-quality reads. In total, we obtained 1,448 GB of
high-quality reads with an average of 10.13 GB per sample. To construct
a comprehensive catalogue of reference genes in the SNM gut microbiome,
we individually assembled the high-quality reads from each sample into
longer contigs with MEGAHIT (v1.2.6) (Dinghua Li 2015). We obtained
62,306,924 contigs longer than 300 bp. Next, MetaGeneMaker (Wenhan Zhu,
2010) was used to predict open reading frames (ORFs), and we obtained
111,177,047 ORFs from 143 samples with an average of 777,462 ORFs per
sample, which were longer than 102 bp, (Table S2). Three non-redundant
gene-sets of the R. roxellana , R. bieti , and R.
strykeri gut microbiome were independently clustered using CD-HIT (Fu
Limin, 2012). We further merged these three non-redundant
gene-sets into an integrated catalogue of reference genes in the SNMs by
CD-HIT (Fu Limin, 2012), referred to RGC (Figure 1).