Quality control of metagenomic data and host reads removal
FastQC version 0.11.8 (17) was used to check the quality of the original
sequencing reads. Then, bowtie2 version 2.3.5.1 (with its default
parameters)(18) was used to align the quality-filtered reads to the
human reference genome
(hg38[https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/])).
Unmapped, non-host reads were separated from host reads and sorted using
samtools version 1.9 (19). The output files, which were in BAM format,
were converted into fastq files using bedtools version 2.28 (20) and
then into fasta files using R/Bioconductor package ShortRead version
3.6.2 (21). Non-host metagenomic reads of each sputum sample were
subjected to bioinformatics pipelines that generate microbiome and
functional profiles. We used MetaPhlAn2 version 2.7.7 and its marker
database (22) to estimate microbiome profiles (viruses were excluded
from the output).