Introduction

Fungi are inconspicuous organisms, only a proportion of which sporadically reveal their presence through the formation of tangible morphological structures such as fruiting bodies1. The scientific study of fungi has therefore benefited immensely from the development of molecular (DNA) sequencing tools during the last 30 years. However, even with the use of molecular tools, studies involving the tropics have neglected fungi despite the fact that the majority of the undescribed fungi are thought to occur in the tropics2, 3, 4. Among all tropical biomes, rainforests provide the widest range of ecosystem services through high above- and below-ground biodiversity5, including water cycling and carbon storage6, 7. The largest and most diverse of those forests is Amazonia8,9, which comprises approximately 40% of the area occupied by rainforest habitats around the world. Amazonian ecosystem services can only be maintained through abiotic and biotic processes, many of which are mediated by fungi.
To better characterize fungal communities in Amazonia, short-read High-Throughput Sequencing (HTS) platforms such as Illumina are being increasingly used10 ,11 ,12 ,13 ,14. These approaches are often used together with PCR techniques to amplify individual markers. In particular, the nuclear ribosomal Internal Transcribed Spacer (ITS) region has been selected as the best DNA region to identify the widest possible range of fungal groups and is therefore commonly used as a universal DNA barcode for fungi15. However, this region is typically 500–600 bases long, preventing it from being sequenced under some sequencing technologies. The use of partial sequencing (targeting only a sub-region such as ITS1 or ITS2) has at times limited the taxonomic coverage and identification of fungi by not providing enough variation to tell species apart16. Furthermore, even though HTS approaches produce hundreds of thousands or millions of sequences per sample, the limited length of these sequences can introduce critical biases to the precise taxonomic identification of the underlying lineages17, 18.
Long-read HTS has the potential to overcome some of these limitations, but they have rarely been used in environmental studies18,19. One of the most well-developed platforms is the single-molecule real-time sequencing platform of Pacific Biosciences (PacBio)20. Although the PacBio platform had a high error rate at the time it was launched, the error rate is currently less than 1%21. Recent studies have shown that the potential of the PacBio platform for the identification of fungal communities using environmental samples is high18, 19, but so far they have not been widely applied to any ecosystems.
Taken together, the use of short- and long-sequence HTS techniques offers the potential to overcome the challenges of characterizing fungal diversity in species-rich ecosystems, such as Amazonia in northern South America. Amazonia is a heterogeneous biome, and its biodiversity has been shown to vary considerably across geographical ranges. On a large scale, a west (more diverse) to east (less diverse) diversity gradient has been observed in many animal and plant groups22, 23, 24 and also in micro-organisms, including fungi10, 25. Another source of heterogeneity in Amazonia is the presence of distinct habitats types. Each phytophysiognomy comprises a largely distinct biota and own soil characteristics, flooding regimes, and nutrient availability11, 26. Four widespread and important habitats, here given in the order of decreasing plant and animal diversity25, 26, are: unflooded tropical forests (terra-firme); forests seasonally flooded by fertile white-water rivers (várzeas); forests seasonally flooded by unfertile black water rivers (igapós); and naturally open areas associated with white-sand soils (campinas). The richness gradient for micro-organisms has been found to differ from this general trend, as campinas harbour the highest microbial richness10, 25.
Soil physicochemical characteristics are often considered crucial for biotic dynamics, vegetation, and diversity patterns at local to regional scales across Amazonia14, 27, 28, 29. Although several studies have reported on the importance of soil characteristics in shaping community structure, no unified pattern has emerged. In a recent study using HTS with short reads from environmental samples in Amazonia, members of our team showed a mixed effect of soil properties on the microorganism richness and community turnover11. In that study, we used general primers to target all eukaryotes, and we did not address specifically these effects on Fungi.
This study seeks to characterize fungal communities across Amazonia using environmental samples of soil and litter. For the first time (to our knowledge) in an Amazonian context, we use a long-read approach to sequence the full fungal ITS region on the PacBio platform. In addition, we combine our novel long-read data with our previously released short-read HTS data of the nuclear ribosomal 18S rRNA small subunit (18S) gene and the mitochondrial cytochrome c oxidase subunit I (COI) gene produced in a Illumina sequencing platform. We discuss the patterns of fungal richness and community turnover across Amazonia and compare the results obtained from different genes and platforms.

Methods

Study area and sampling design

We sampled four localities across Brazilian Amazonia (Fig. 1) following the sampling design described by Tedersoo et al.12. Detailed locality descriptions are available in Ritter et al.10. Briefly, we sampled all depths of the litter layer above the mineral soil (all organic matter, including leaves, roots, and animal debris) and the top 5 cm of the mineral soil in a total of 39 circular plots, each with a radius of 28 m. We chose 20 random trees inside each plot and collected litter and soil on both sides of each tree. We then pooled the samples by substrate to produce one litter sample and one soil sample per plot. The soil physicochemical properties were determined by a Brazilian company (EMBRAPA); additional details of the soil analysis can be found in Ritter et al.11.

Data generation

For the nuclear ribosomal small subunit (SSU) 18S rRNA (18S) and the mitochondrial cytochrome c oxidase subunit I (COI) genes, we used the OTU table produced in Ritter et al.25. We selected the OTUs assigned to the fungal kingdom based on SILVA30 for 18S and GenBank31 for COI datasets, respectively, for all our analyses. We present here the results of both markers in light of the fact that the previous publication did not analyse fungi separately, which imposed limits on the fungal richness and community structure analyses employed at the time.
For the ITS, we used the approach described in Tedersoo et al.18. We used the forward primers ITS9MUNngs (5′‐TACACACCGCCCGTCG‐3’32) and ITS4ngsUni (5′‐CCTSCSCTTANTDATATGC‐3’32) to target the full ITS region (ITS1 - 5.8S - ITS2)”. For amplification, we used a PCR mixture comprised 5 ul of FirePol DNA polymerase master mix (Solis Biodyne, Tartu, Estonia), 0.5 ul of each forward and reverse primer (20 mM), 1 ul of DNA extract and 18 μl ddH2O. Thermal cycling included an initial denaturation at 95 °C for 15 min; cycles of denaturation for 30 s at 95 °C, annealing for 30 s, elongation at 72 °C for 1min; final elongation at 72 °C for 10 min and storage at 4 °C. The duplicate PCR samples were pooled; their relative quantity was estimated by running 5 μl DNA on 1% agarose gel stained with ethidium bromide (Sigma-Aldrich, St Louis, MO, USA). We used negative (for DNA extraction and PCR) controls throughout the experiment. The amplicons were purified with FavorPrep PCR Clean Kit (FavorGen Biotech Corporation, Vienna, Austria). The libraries were prepared using PacBio amplicon library preparation protocol (Pacific Biosciences, Inc, Menlo Park, Ca, USA) and loaded to seven SMRT cells using the MagBead method. The libraries were sequenced using the PacBio RS II instrument using P6-C4 chemistry following the manufacturer´s protocol.
Bioinformatics analyses were performed using the PipeCraft platform33. PacBio circular consensus reads (CCS, reads_of_insert) were quality filtered with vsearch34 (maxee = 2, maxns = 0, minlen = 150). Filtered reads were demultiplexed based on the unique sequence identifiers using mothur35 (bdiffs = 1). Putative chimeric reads were filtered using denovo and reference-database-based methods in vsearch. Additionally, sequences where the full PCR primer was found anywhere in the read were filtered out using the PipeCraft built-in module, as these reads represent additional chimeras not detected by the vsearch method. The full ITS region was extracted using ITSx36 and clustered using the UPARSE algorithm37 with a 98% similarity threshold. Additionally, the post-clustering curation method LULU38 was applied (minimum_ratio_type = “min”, minimum_match = 98) to merge consistently co-occurring ‘daughter’ OTUs. Taxonomy annotation was performed using BLASTn39against the UNITE40, 41 and INSDC42databases.

Statistical analysis

We performed all statistical analyses in R v.3.6.043. Two samples (SCUICAMP3 and LCUITFP3) had a very low number of reads in the ITS results, and were excluded from subsequent analyses of all markers. We use as a diversity estimative the effective number of OTUs, calculated with the unrarefied read counts as OTU abundance, using the exponential of the Shannon entropy diversity of order q = 144. This measure is more robust against biases arising from uneven sampling depth than the simple number of OTUs45. For the abundance-based community matrices, we transformed read counts using the “varianceStabilizingTransformation” function in DESeq246 as suggested by McMurdie and Holmes45. This transformation normalizes the count data with respect to sample size (number of reads in each sample) and variances, based on fitted dispersion-mean relationships46.
We tested the correlation between diversity of each marker through a Pearson correlation between each pair of markers. To test between the community composition correlation, we performed a Mantel test with the Jaccard dissimilarity matrices, using the Pearson correlation and 999 permutations for significance. Both analyses were performed using the vegan v.2.5.5 R package47.
For soil physicochemical analysis, we first normalized all variables to mean = 0 and variance = 1. We then performed two principal component analyses (PCA), one for soil grain size and the other for chemical compounds, using the vegan package. We used the first axis of each PCA (explaining 56% and 69% of the total variation, respectively) in the subsequent linear models and multiple regressions analysis. Given the expected importance of soil organic carbon content11,48 and pH11, 49, we used these as independent variables.
/
To test the effect of soil properties on fungal OTU richness, we performed a Bayesian general linear model (GLM) analysis, as implemented in the R-INLA v.17.6.20 R package50. The response variables were the OTU diversity by soil layer (litter and soils) and marker (18S, ITS and COI), giving a total of six models. In each case, the soil properties (PC1 for the physical, PC1 for the chemical, organic carbon content, and pH both standardized to mean = 0 and variance = 1) were used as explanatory variables. We tested the effect of spatial autocorrelation by comparing analyses of standard GLMs with GLM analysis using stochastic partial differential equations (SPDE) that explicitly consider spatial correlation.
To test the effect of soil properties on fungal community turnover, we used multiple regressions on dissimilarity matrices (MRM) with the R package ecodist v.2.0.151. The response variables were dissimilarity matrices calculated using the Jaccard dissimilarity. In each case, the explanatory variables were the distance matrices based on soil properties (physical PC1, chemical PC1, organic carbon, and pH) and one geographical distance matrix (all calculated using Euclidean distances). Statistical significance of the regression coefficients was determined using 10,000 permutations.
For the analysis of differences of community composition by locality and habitat, we performed a non-metric multi-dimensional scaling (NMDS) analysis using the Jaccard dissimilarity matrix and tested the significance of groups using the envfit test, which fits vectors of continuous variables - in this case the NMDS axes - and centroids of levels of class variables (locality, habitat, and soil layer) using the vegan package. Additionally, we performed a permutational analysis of variance (PERMANOVA) to test the significance of each factor (locality, habitat, soil layer, first PC of both PCAs, pH, and carbon) in the community composition of each dataset (18S, COI, and ITS) using the vegan package. We furthermore performed an analysis of indicator OTUs of each locality, habitat, and soil layer using the R package indicspecies v.1.7.652 using the matrix of relative abundance. This analysis identifies the species, in our case the OTUs, that are associated with a determined group. We performed the analysis three times with each dataset (18S, COI, and ITS): the first grouping the OTUs by locality, the second by habitat, and the third by soil layer. We tested the significance with 9,999 permutations, from which we quantified the number of indicator OTUs for each group with an alpha < 0.05. Following this, based on literature and experience, V.X.L. classified all possible indicator OTUs in their functional guild group, such as mycorrhizae, phytopathogen, and saprobe based in their assigned taxonomy (Table S3). We calculated the mean number of OTUs by each factor (locality, habitat, and soil layer) in each dataset (18S, COI, and ITS) using the vegan R package. We produced a Venn diagram for visualization of the number and proportion of exclusive and shared OTUs for each factor (locality, habitat, and soil layer) in each dataset (18S, COI, and ITS) using the online tool Venny 2.053. Additional R packages used for data curation were tidyverse v.1.2.154 and ggplot2 v.3.1.155. All scripts and data used in the analyses are available as supplementary material.

Results