Bioinformatics
Here, three different bioinformatics pipelines were used to analyze metabarcoding data. Combination of analyses among metabarcoding data sets and between microscopy vs . metabarcoding data demonstrated highly correlated patterns (Table 2, Fig. 3-7). Although the comparisons between microscopy and metabarcoding data demonstrated highest Mantel correlation of the former with the 95% OTUs data set, Procrustes correlation was highest with the ESVs data set (Table 2). Thus, the identification of the best performing pipeline, in terms of consistency with microscopic inventories, may depend on the applied statistical method. Nevertheless, considering the genus level comparisons from metabarcoding, the ESVs data set had the slightly higher number of genus level identifications, and therefore a (marginally) higher match proportion with microscopy data. But interestingly, the ESVs data set harbored also one diatom taxon that is considered to be purely marine,Trachyneis sp. (Fig. 4; Table S3). ESV assigned to the latter taxon, however, consisted of only three reads across the whole data set. It is uncertain whether this low abundance ESV assigned toTrachyneis sp. represents remaining sequencing errors or real occurrences, but raises caution in evaluating data quality based on highest taxonomic richness alone. In any case, the differences between bioinformatics workflows in this study were minor and indicated highly similar signals among all of them.
Although this study displayed an overall consistency of results obtained by different pipelines, several studies have demonstrated the influence of bioinformatics in metabarcoding studies targeting other organisms (Majaneva, Hyytiäinen, Varvio, Nagai & Blomster 2015; Sinha et al. 2017; Anslan et al. 2018; Pauvert et al. 2019). The impacts may originate from inadequate error filtering processes (Edgar 2017), inaccurate taxonomic annotation (Anslan et al. 2018) or inappropriate clustering methods in the specific case of diatoms (Tapolczai et al. 2019). Towards standardization of analyzing short (312 bp) rbcL metabarcoding data of diatoms for biomonitoring purposes, the studies of Tapolczai, Keck, Bouchez, Rimet and Vasselon (2019) and Rivera, Vasselon, Bouchez and Rimet (2020) found that individual sequence units (ISU) approach tend to outperform operational taxonomic units (OTU) based approaches. Nevertheless, furthest neighbor OTU clustering and ESVs approach showed to perform equally well (Rivera, Vasselon, Bouchez & Rimet 2020). Our study also resulted in highly similar results across three applied pipelines (ESVs vs . two OTU approaches), which demonstrates that the appropriate filtering of erroneous sequences and critical taxonomic assignment of the target taxa may be a key step, with the potential of mitigating the otherwise considerable effect of bioinformatics.