Characterizing larval breeding sites: bacterial community composition
In addition to microbiome density, we performed 16s-rRNA gene amplicon sequencing to explore the bacterial community composition in most larval breeding sites (Table 1), inspired by previous studies that suggested different bacteria between habitats (Dickson et al., 2017). The sample processing and sequencing library preparation are described in the Appendix. In short, we collected cells from the water by centrifuge or filtering, extracted DNA, and amplified the 16s-rRNA gene V4 region using primers reported in Kozich et al. (2013). The amplicons from multiple samples were multiplexed and sequenced on Illumina MiSeq (Illumina, USA) at the Yale Center for Genome Analysis. We conducted amplicon sequencing for La Lopé and Rabai samples separately.
We demultiplexed the sequencing reads using USEARCH v10.0.240  (Edgar, 2010) and followed the pipeline of DADA2 (v1.8.0) (Callahan et al., 2016) to determine the bacterial community composition. DADA2 estimates sequencing errors and infers the exact sequence variants (i.e., amplicon sequence variants, or ASVs), which are analog to the conventional operational taxonomic unit (OTU). ASVs were blasted to the Ribosomal Database Project (RDP) 16s-rRNA gene reference database (RDP trainset 16 and RDP species assignment 16) (Cole et al., 2014) for taxonomic assignment.
Using R package phyloseq (McMurdie & Holmes, 2013), we first calculated the alpha diversity of the bacteria community in each larval breeding site indicated by the Shannon index (Shannon, 1948), using raw read counts. We then compared the index across larval breeding site groups, habitats, and between Ae. aegypti present and absent sites. The community compositions were summarized by non-metric multidimensional scaling (NMDS) with the Bray-Curtis distance matrix. Similar to PCA, NMDS summarizes multivariate data (each bacterial taxa as one variable) but is more appropriate for bacterial composition data (Ramette, 2007). Before NMDS analysis, we first removed samples with fewer than 5000 reads to avoid low-quality samples, and thinned each sample proportionally to the lowest read depth of all samples to control for uneven sequencing depth. Bacterial communities may show different assembly patterns at different taxonomic levels (Goldford et al., 2018). Therefore, we calculated the Shannon index and performed NMDS at four taxonomic levels: ASV, species, genus, and family. We also demonstrated the major bacterial groups at the family level using bar plots. Lastly, we used R package DESeq2 to identify bacterial families that are most differentiated between habitats (Love et al., 2014).
To estimate the temporal stability of bacterial communities, we collected water samples more than once for five larval breeding sites in each habitat. The average interval between two consecutive collections ranges from 3 to 21 days, with an average of 8.4 days in La Lopé and 17 days in Rabai. All temporal samples were sequenced, but only the first-day samples were included in the analyses described above. We performed a separate NMDS analysis to examine variation between temporal samples.