The absence of morphological species in DNA metabarcoding datasets is not necessarily linked to the reference databases used for taxonomic assignment
Importantly, within the metazoan diversity DNA metabarcoding was not able to detect all morphological species, despite the fact that our reference database included 51 out of the 57 morphospecies. Our results illustrate that the reference database is not the only cause of the low number of ASVs with taxonomic assignments. When including all biological replicates, our best performing primer set (A) was unable to detect 19 out of the 59 morphological species. Only six of these 19 species did not have a reference sequence, indicating that other factors than the reference database are influencing species detection in metabarcode datasets. First, the wetlab procedure can lead to missing species because of differences in efficiency of DNA extraction between species or because primers insufficiently match with the COI gene of those species. All 19 species are relatively large animals, and for several species more than one individual was found in a particular sample. Several species have soft tissues, it is therefore unlikely that DNA extraction would be problematic for these species. Eleven species belonged to the Polychaeta, a class characterized by high species and sequence diversity in the COI gene (Carr, 2012). It is therefore likely that for these species, the primers were not a good match. This is further strengthened by the fact that for six polychaete species voucher specimens were available and good DNA was extracted, while no PCR product or bad Sanger sequences were obtained. This illustrates the great benefit of primer free methods for biodiversity assessment, although these are at this point more expensive than DNA metabarcoding (Giebner et al., 2020). Second, taxonomic assignment through the RDP classifier may be inefficient. One species, Acrocnida brachiata , was identified using the Midori reference dataset which contains 151 COI sequences of this species, including a sequence that is identical to our own reference sequence. It is known that the content and size of the training set strongly impacts taxonomic assignment with RDP Classifier (Ritari et al., 2015). Finally, the morphological identification by our experts may have been incorrect. However, the morpho-taxonomic analyses of macrobenthos is under accreditation (accreditation certificate nr. 315-TEST, following NBN EN ISO/IEC 17025:2005) and the two experts that have conducted the morphological identification have very low misidentification rates (at most 3 taxa have been misidentified in a sample that underwent quality control over the last 9 years), suggesting that misidentification can only have a minor impact on our results.