Comparison between short and long reads
Our results showed a similar pattern for the habitat diversity of long
and short-reads, corroborating the patterns previous
reported10, 11, 25. These
similarities support the view that our findings are real and independent
of any possible methodological biases introduced by the different
markers and platforms.
The importance of soil properties on the diversity and community
turnover varied among markers. We acknowledge the different taxonomic
coverages of each marker and the limitations of the available databases.
For instance, the diversity of the early-diverging fungal lineages
Chytridiomycota, Cryptomycota, and Zoopagomycota using 18S is higher and
it is in stark contrast with the ITS and COI data. This difference may
be the result of either PCR biases or of shortages of the reference
databases used. The COI is usually used as barcode for
metazoans80, with lower sequence available for fungi.
Our COI data showed around 40% of unidentified
OTUs25, which could represent at least in part some
fungal lineages without public reference sequences. Uneven availability
of reference sequences may have had impact on our diversity and
community composition results for the various markers used, with the
highest effect for the COI results.
The use of short-read fragments (for both 18S and COI) resulted in a
higher number of OTUs, for all organisms, than did the long-read
technique. Long-read ITS, on the other hand, registered more fungal OTUs
even though the total number of OTUs was smaller than for short reads.
It is important to stress here that, unlike for the ITS region, for
short-reads we used general primers targeting all eukaryotes and not
just fungi, such that only a portion of reads belonged to fungi in the
18S and COI datasets. Although the differences in primer design preclude
us to reliably identify the “best” marker or sequencing platform
choice for fungal assessments in general, we highlight the main
advantages and disadvantages of those used here.
On the one hand, we showed that the use of 18S under the Illumina
platform provides the overall highest taxonomic coverage. So for studies
aiming to compare diversity and community turnover the use of
short-reads can be recommended. In economic terms, this is also the more
cost-efficient option at the moment. On the other hand, due to the short
fragment size of Illumina reads, some OTUs could be potentially
misidentified or categorised only at, for example, the family or genus
level. For instance, in an earlier study comparing the taxonomic
identification of short-read HTS, the choice of the ITS sub-region, ITS1
or ITS2, affected 51% of fungal identifications16.
Long-read HTS methods have the potential to identify fungi with higher
accuracy, despite recording fewer sequences per
sample18. In our data, PacBio registered the highest
number of OTUs classified as fungi but the lowest number of total OTUs.
This is expected, since PacBio platforms have a small number of reads in
total81 and also will not sequence partially degraded
DNA. Additionally, long reads have the potential of combining population
analysis with environmental data. This is limited with short-reads,
which provide a more limited genetic variation for environmental
diversity analysis or require the sequencing of several markers for a
limited number of target individuals.