Bioinformatic processing
Demultiplexed reads were provided by the sequencing company (Admera
health Biopharma Services) and were checked for quality using MultiQC
(Ewels, Magnusson, Lundin, & Kaller, 2016). Forward and reverse primers
were removed using Trimmomatic (Bolger, Lohse, & Usadel, 2014). Reads
were further processed using the Dada2 pipeline in the Dada2 v1.12.1
package (Callahan et al., 2016) in R v3.6.2 (Team, 2019) to obtain
amplicon sequence variants. Standard filtering parameters were used,
except for the maximum number of errors allowed in a read, which was set
at 3. Reads were further trimmed to remove parts with a quality score
< 30 while keeping at least 5 bp overlap. For each sample and
each primer set, unique reads were determined, merged and filtered for
chimera’s. Taxonomy was assigned using a custom made reference database
containing in house and Bold COI sequences from macrobenthic species
that have been found during monitoring campaigns in the Belgian part of
the North Sea over the last ten years. This reference database contained
346 Sanger COI sequences from 306 species. The newly generated COI
sequences have been uploaded to BOLD and are part of a larger study to
build a COI reference database for macrobenthos from the whole North Sea
region (https://northsearegion.eu/geans/). Taxonomy was assigned with
the naïve Bayesian classifier (Wang, Garrity, Tiedje, & Cole, 2007)
with the number of bootstraps set at 80 (minBoot = 80). Barplots were
created in R to visualize the number of reads, ASVs and the percentage
of ASVs with assigned taxonomy for the ethanol and bulk samples for each
primer set. ASVs that did not receive a taxonomic assignment at the
phylum level using our custom reference database were extracted and
taxonomic assignment was repeated as above but now using the
MIDORI_UNIQUE_COI_MARINE_20180221 reference dataset (Machida et al.,
2017) downloaded fromhttp://genoweb.toulouse.inra.fr/frogs_databanks/assignation/to ensure that the lack taxonomic assignment was not caused by the
reference database used. Only a small fraction of the data was
additionally assigned taxonomy when using the Midori dataset (see
results), so all further comparisons were made using taxonomic
assignments with our custom made reference database because it has been
shown that smaller training datasets tailored to the taxa and geographic
region of interest yield better results for genus and species level
assignments than using the largest possible database (Macheriotou et
al., 2019; Ritari, Salojarvi, Lahti, & de Vos, 2015). For primer set A,
unassigned ASVs after MIDORI were matched against the nt database of
NCBI using Blastn to check whether these unassigned ASVs were from
non-metazoan origin. Taxonomic assignments with qcov >50
and pident > 90 were considered a reliable hit.