Quality control of metagenomic data and host reads removal
FastQC version 0.11.8 (17) was used to check the quality of the original sequencing reads. Then, bowtie2 version 2.3.5.1 (with its default parameters)(18) was used to align the quality-filtered reads to the human reference genome (hg38[https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/])). Unmapped, non-host reads were separated from host reads and sorted using samtools version 1.9 (19). The output files, which were in BAM format, were converted into fastq files using bedtools version 2.28 (20) and then into fasta files using R/Bioconductor package ShortRead version 3.6.2 (21). Non-host metagenomic reads of each sputum sample were subjected to bioinformatics pipelines that generate microbiome and functional profiles. We used MetaPhlAn2 version 2.7.7 and its marker database (22) to estimate microbiome profiles (viruses were excluded from the output).