FIGURE 1  Map of the study area. (a) The location of Northwest Yunnan in China; (b) The topographic map of Northwest Yunnan and the distribution of national nature reserves (NNRs) and provincial nature reserves (PNRs) in this region
Combined with the results of our previous research, a total of 114 key higher plant species in Northwest Yunnan were suggested (Ye et al., 2020a). The information used to construct the dataset (e.g., Taxonomic level, threatened level and geo-referenced records) were obtained from field survey in nearly a decade and main virtual herbarium in China (for more details, see Ye et al., 2020a). In this study, we selected a total of 25 species (including 314 distribution records) from 114 key higher plant species (comprises 941 geo-referenced records) (see selection standards below). The selection was based on the combination of the following standards: (1) occurrence records: in order to improve the accuracy of the MaxEnt model prediction as much as possible, the number of species distribution records should not less than four; (2) simulation accuracy: MaxEnt model should have good or excellent simulation accuracy for included species (for more details, see section 2.4.1).

2.2 Environmental variables

Based on previous studies (Nieto et al., 2015; Ștefănescu et al., 2017; Zhang et al., 2019; Liu et al., 2019), we initially selected 24 environmental variables that may affect species distribution to model the current potential geographical distribution patterns (Table 1). Above all, we divided these variables into five groups according to their categories. After that, 24 environmental factors were resampled and reprojected to an equal-area grid system with the same spatial resolution (0.05° × 0.05°) as species richness (Wang et al., 2018). Then we employed the ArcGIS 10.4 software (Esri; Redlands, California, USA) to extract the raster data of environmental variables.
In order to avoid multicollinearity of environmental parameters that might result in model over-fitting, we calculated Pearson correlation coefficient between each variable with the help of R 3.5.2 software (https://www.r-project.org/). After performed a multicollinearity test, we finally obtained 11 independent environmental variables (r< 0.8) to model the potential geographical distribution of each species (Zhang et al., 2019; Mukherjee et al., 2020) (Figure 2).

2.3 Model construction

We constructed two models in this research: one was for predicting the potential geographical distribution area of species; the other was for analyzing the main environmental factors influencing the potential distribution of species.

2.3.1 Construction of the MaxEnt model

Phillips et al. (2004) developed the MaxEnt model based on the theory of maximum entropy, which was applied to the study of simulating species distribution (Phillips et al., 2006; Zhang et al., 2011). The model can combine species occurrence localities and environmental parameters to predict the habitat suitability, and then explores the possible distribution of the species in the study area (Phillips et al., 2006; Phillips & Dudík, 2008; Zhang et al., 2019).
In this study, the latitude and longitude of species distribution sites and the independent environmental factors in Northwest Yunnan were simultaneously imported into the MaxEnt model (Version 3.3.3k) to construct the correlation function between species and the environment. Usually, the prediction results of the MaxEnt model are related to some set parameters, such as the max number of background points (BC), regularization multiplier (RM), and feature combination (FC) (Zhu et al., 2018). MaxEnt is applied to run with the following modeling regulations: (1) for species with < 10 distribution records linear features were applied; (2) for species with 10-14 records quadratic features were utilized, and; (3) species with > 15 records hinge features were employed (Zhang et al., 2012; Zhang et al., 2017). In this research, we set the RM value to [0.5, 3], the step size was 0.5; the BC value set as [5000, 15000], the step size was 5000. After that, we applied the linear, quadratic and hinge features to construct the MaxEnt model, respectively. In addition, 75%
of species distribution locations were randomly selected as training data to build the model, and the remaining 25% of the species distribution locations were used as testing data for model validation (Guan et al., 2018; Zhang et al., 2019). The maximum iterations was set as 500, and the number of replicate runs was set as 10 or 20.