Bioinformatics
Here, three different bioinformatics pipelines were used to analyze
metabarcoding data. Combination of analyses among metabarcoding data
sets and between microscopy vs . metabarcoding data demonstrated
highly correlated patterns (Table 2, Fig. 3-7). Although the comparisons
between microscopy and metabarcoding data demonstrated highest Mantel
correlation of the former with the 95% OTUs data set, Procrustes
correlation was highest with the ESVs data set (Table 2). Thus, the
identification of the best performing pipeline, in terms of consistency
with microscopic inventories, may depend on the applied statistical
method. Nevertheless, considering the genus level comparisons from
metabarcoding, the ESVs data set had the slightly higher number of genus
level identifications, and therefore a (marginally) higher match
proportion with microscopy data. But interestingly, the ESVs data set
harbored also one diatom taxon that is considered to be purely marine,Trachyneis sp. (Fig. 4; Table S3). ESV assigned to the latter
taxon, however, consisted of only three reads across the whole data set.
It is uncertain whether this low abundance ESV assigned toTrachyneis sp. represents remaining sequencing errors or real
occurrences, but raises caution in evaluating data quality based on
highest taxonomic richness alone. In any case, the differences between
bioinformatics workflows in this study were minor and indicated highly
similar signals among all of them.
Although this study displayed an overall consistency of results obtained
by different pipelines, several studies have demonstrated the influence
of bioinformatics in metabarcoding studies targeting other organisms
(Majaneva, Hyytiäinen, Varvio, Nagai & Blomster 2015; Sinha et
al. 2017; Anslan et al. 2018; Pauvert et al. 2019). The
impacts may originate from inadequate error filtering processes (Edgar
2017), inaccurate taxonomic annotation (Anslan et al. 2018) or
inappropriate clustering methods in the specific case of diatoms
(Tapolczai et al. 2019). Towards standardization of analyzing
short (312 bp) rbcL metabarcoding data of diatoms for biomonitoring
purposes, the studies of Tapolczai, Keck, Bouchez, Rimet and Vasselon
(2019) and Rivera, Vasselon, Bouchez and Rimet (2020) found that
individual sequence units (ISU) approach tend to outperform operational
taxonomic units (OTU) based approaches. Nevertheless, furthest neighbor
OTU clustering and ESVs approach showed to perform equally well (Rivera,
Vasselon, Bouchez & Rimet 2020). Our study also resulted in highly
similar results across three applied pipelines (ESVs vs . two OTU
approaches), which demonstrates that the appropriate filtering of
erroneous sequences and critical taxonomic assignment of the target taxa
may be a key step, with the potential of mitigating the otherwise
considerable effect of bioinformatics.