Figure captions
Figure 1. Quality profiles for selected samples in each study. Each
panel shows the quality profile for the sample with the highest (top)
and lowest (bottom) number of reads for each study. The gray scale
indicates the frequency of each quality score at each base position
(darkness indicates a higher frequency). Green and orange lines indicate
the mean and quartile quality score at each position, respectively.
Figure 2. Percentage of reads preserved after standard processing with
dada2
(Ben
J Callahan et al., 2016) without (top) and with (bottom) chimera
checking. Control samples (n=5) are shown for each study, as a percent
of the original reads recovered from INSDC databases.
Figure 3. Relationship between taxonomic classification and read length,
for the control samples (n=5) of each dataset. Color indicates taxonomic
level.
Figure 4. Relationship between alpha diversity and read length. Richness
(q=0, top) was calculated as the number of ASVs per sample. Inverse
Simpson’s index (q=2, bottom) was calculated according to
(Chao
et al., 2014). The diversity in control samples (n=5) was assessed for
each read length.
Figure 5. Variance in community composition with increasing read
lengths. The mean pairwise dissimilarity between the 5 control samples
in each study was assessed using Sorensen (a) and Bray-Curtis (b)
dissimilarities.
Figure 6. Information loss from shorter read lengths. For each dataset,
Mantel tests between the Sorensen (a) and Bray-Curtis (b)
dissimilarities 200-bp reads and each shorter read length evaluated the
correlation in microbial communities between shorter read lengths and
the most information-rich version of the dataset (200 bpp).
Figure 7. Outcome of statistical tests comparing control and disturbed
communities for each dataset, across read lengths. Kruskall-Wallis tests
(top) evaluated differences in richness and inverse Simpson’s index
between the control and recently-disturbed (1 day, n=5 for each time
point) for each study. Similarly, PERMANOVA tests (bottom) evaluated
differences in the community composition between control and
recently-disturbed samples using Sorensen and Bray-Curtis
dissimilarities.