2.5 Data processing and statistical analysis
All statistical analyses were conducted in R 4.2.3 (R Core Team, 2021) with the primary packages cluster , factoextra ,phyloseq , vegan , and ggplot2 . Nonmetric multidimensional scaling (NMDS) was used to group the 38 sampling sites into spatial zones with distinct differences in fish community composition according to the relative OTU richness, fish individual number and biomass. NMDS relies on the rank order of pairwise variable dissimilarities (Euclidean distance in this study) and does not make any underlying distributional assumptions of the data (Borcard et al., 2011). Sampling sites were plotted in ordination space with the distance between points positively related to the dissimilarity of output parameters (i.e., sites with similar output parameters were plotted closer to one another). The analysis of similarity (ANOSIM) test was used to evaluate the dissimilarity matrix and test whether groups of objects had significantly (P< 0.05) different mean dissimilarities.
Based on the OTU richness, the pairwise taxonomic Bray‒Curtis dissimilarity matrix between different samples was calculated using themicroeco package (Liu et al., 2021). Environmental factors and fish OTU richness that showed significant variations in their values were used, and stepwise forwards selection was performed to linearly reduce the correlated variables along the axes. A permutation limit (with a P value of 0.05) was used to determine which variables to incorporate into the final model. The relationship between eDNA-based and number-/biomass-based alpha diversity was estimated by linear regression. Linear dependencies were explored by computing the variable variance inflation factors to ensure no confounding colinearity. The statistical significance of the axes derived from each analysis was tested with a Monte Carlo test (999 permutations).
Linear discriminant analysis effect size (LEfSe) is an algorithm for high-dimensional indicator discovery that identifies taxa by characterizing the differences between two or more biological conditions (Segata et al., 2011). LEfSe emphasizes both statistical significance and biological relevance, allowing researchers to identify discriminative features that are significantly different between biological classes. The nonparametric factorial Kruskal−Wallis sum-rank test was first used to detect features with significant differential abundance with respect to the class of interest. Second, LEfSe uses linear discriminant analysis to estimate the effect size of each differentially abundant feature and rank the feature accordingly (Liu et al., 2021).
Two-way hierarchical clustering analysis was performed using thepheatmap package. The package used clustering distances and methods implemented in the dist and hclust functions in R. The clustering analysis divided fish species with similar responses to the environmental factors into a group. Statistically significant cluster trees were identified using a bootstrap randomization technique in which the nonzero values were resampled and used to generate pseudovalues under the null hypothesis. The result was displayed as a heatmap.