Bioinformatic processing
The total number of sequences generated for each primer set ranged between 2 786 114 and 3 845 487 (ESM Table 1). One sample (ZVLA_B) received very low read numbers for all primer sets. Primer set B yielded the highest number of raw reads, but almost half of the reads were lost during the filtering step. For primer sets A and E, relatively few reads were lost during filtering, denoising and chimera removal, resulting in more than 2.5M reads for further processing (ESM Table 1, ESM fig 1). Read numbers were comparable between bulk and ethanol samples for each primer set except for primer set A, where approximately three times more reads were obtained for the bulk samples compared to the ethanol samples (ESM Fig2A). Nevertheless, a comparable number of ASVs was obtained for the ethanol and bulk samples with primer set A (average of 149 and 132 respectively, ESM table 1, ESM Fig2B). For primer sets C and E, more ASVs were found in the ethanol than in the bulk samples (ESM Fig2B). The total number of ASVs generated across the 24 samples substantially varied between primer sets and was lowest for primer sets A and E (2139, 22151, 14813, 15211 and 5230 ASVs for primer sets A, B, C, D and E, respectively) (ESM Fig3).
The percentage of ASVs that were assigned taxonomy using our custom COI reference database was low (22.6%, 12.1%, 11.7%, 4% and 10.9% for primer sets A,B,C,D and E respectively; ESM Table 2). However, for primer set A, the 1655 unassigned ASVs represented only 13.4% of the total number of non-chimeric reads generated for that primer set. This percentage was considerably higher for the other primer sets and ranged between 38.4 and 81.7% (ESM Table 2). Phylum level assignments were comparable between the ethanol and bulk samples for primer sets B, C and D, while more ASVs from the bulk samples were assigned to phyla compared to the ethanol samples for primer sets A (bulk: 30%, ethanol: 20%) and D (bulk: 20%, ethanol: 9%) (Fig 2). At the species level, taxonomic assignment of the bulk samples was highest for primer set A (25 %, 9 %, 10 %, 3 % and 16 % for primer sets A,B,C,D and E, respectively). When using the COI Midori dataset for taxonomic assignment of the unassigned sequences, only a small fraction were assigned at the phylum level (8.6%, 0.6%, 1.1%, 5.7% and 5.8% for primer sets A, B, C, D and E, respectively), and these assigned ASVs represented 5.3%, 0.5%, 0.8%, 6.0% and 20.6% of the total non-chimeric reads for primersets A,B,C,D and E, respectively. For primer sets A and E, most reads were assigned to the cnidarian Obelia bidentata (1.3% and 5.1%, respectively) which was found in all ethanol samples, in all bulk samples of location 120 and 330 and in very low abundance in one bulk sample of location 840 (37 and 22 reads, respectively). A detailed list of all species detected after taxonomic assignment with the Midori dataset for each primer set is available in ESM Table 3. To investigate whether the unassigned ASVs after Midori were of non-metazoan origin, a blastn search was done for primer set A against the nt database of NCBI. This resulted in only 83 of the 1471 unassigned ASVs recieving a reliable assignment (query coverage >50, % identity >90), representing 58 species. All species had low read numbers, except for Limecola balthica , which was detected in the bulk DNA of two replicates of the Limecola balthica community (ZVL) with more than 10 000 reads (ESM Table 4). The non-metazoan taxa were represented by three fungal, two bacterial, five Viridiplantae and 27 algal or diatom species which all together only represented 0.3% of the total non-chimeric reads (ESM Table 4).