Materials and methods

Study region

Our study was conducted in the southern part of the Korean Peninsula, which is located the East Asia region (33°–38° N, 125°–131° E; Fig. 1). The total area of the study region is 100,033 km2, and the human population is 51 million (Ministry of Land Infrastructure and Transport, 2016). The mean annual air temperature range ranges from 10–15 °C, and the range of the mean annual precipitation is 1000–1900 mm (Korean Meteorological Administration, 2020).
The Korean Peninsula is situated adjacent to the west Pacific Ocean and is surrounded by water in three directions—east, west, and south. It is a temperate region with four distinct seasons associated with the East Asian monsoon that occurs on the far eastern side of the Asian continent (Yi, 2011). Winter (December–February) is cold and dry because of the formation of the strong Siberian anticyclone from the Tibetan Plateau, while summer (June–August) is hot and humid, with around 70% of the annual precipitation focused in this period (Korean Meteorological Administration, 2020) (Appendix A2). Although the Korean Peninsula is located at the eastern edge of the Eurasian continent, the humid air supplied from the Yellow Sea to the west affects the diversity and distribution of local plants.
The Korean Peninsula comprises a large number of mountains centered in the Baekdudaegan Mountain Range, and only 22.5% of the peninsula is flat land (Appendix A3) (Ministry of Land Infrastructure and Transport, 2016). In addition, although the elevation is not generally high, the region displays complex tectonic characteristics with a relatively diverse topography. Because the altitude gradients are shallower than those in other regions in East Asia, the borders between mountains and plateaus are relatively indistinct, making the region well suited for the spatiotemporal movement of plants (Ministry of Land Infrastructure and Transport, 2016).
There are around 4,300 known species of vascular plant on the Korean Peninsula (with approximately 3,000 species in the southern part), including 280 species of pteridophyte, 53 species of gymnosperm, and 3,963 species of angiosperm. In terms of specialized genera,Pentactina , Echiosophora , Abeliophyllum ,Hanabusaya , Mankyua , and Megaleranthis are present. According to the Whittaker biome classification (Whittaker, 1962), the southern part of the Korean Peninsula is mostly occupied by temperate seasonal forest biomes but may also contain some temperate rain forest and woodland/shrubland biomes (Fig. 1). In terms of the remnant vegetation landscape of the Korean Peninsula, strong policies to promote agriculture throughout the Joseon Period (1392–1910) led to a large decrease in forests and an increase in grassland and shrubland habitats. Later, in the southern part of the Korean Peninsula, the South Korean government pursued policies to promote forests from the 1970s, resulting in most natural habitats being located in forests (Cho et al., 2018). Currently, approximately 30.3% of the southern part of the Korean Peninsula is urbanized or used for agriculture and 63.8% is occupied by forests, with other land covers accounting for the remaining 5.9% (Ministry of Land Infrastructure and Transport, 2016).

Plant distribution data

We used vascular plant distribution data based on specimen and coordinate data for plants collected between 2003 and 2015 in the southern part of the Korean Peninsula (Korea National Arboretum, 2016). The vascular plant distribution maps contained coordinate data for 309,333 specimens, corresponding to 2,954 taxa in 175 families and 919 genera. For analysis, a grid system was overlaid on a national topographic map to combine the taxonomic groups located in each cell of the grid (cell size, 11.2 km × 13.9 km) with the location coordinates in a single data set (Graham and Hijmans, 2006; Lenormand et al., 2019). All 771 grid cells were used in the analysis, but some large urban regions were excluded from the floristic survey conducted by the Korea National Arboretum, and so these were left as empty cells.

Analysis of floristic assemblage clusters and characteristics

Using distribution data for the 771 grid cells and 2,965 plant taxa, a SOM training data set was constructed in the form of a presence-absence matrix (771 rows × 2,954 columns) (Fig. 2). The ‘kohonen’ R package was used for the SOM algorithm (Wehrens and Kruisselbrink, 2018), and the output layer was composed of 81 output nodes arranged in a square lattice. To determine the types, hierarchical cluster analysis was applied to the weight vectors of the SOM map units after conversion to Euclidean distance metrics (via the function hclust in R using the complete linkage method). The optimal number of types was calculated by applying the silhouette coefficient to the range of 2–15 types (Rousseeuw, 1987). In mapping the regionalization results, the grid cells that were empty because of exclusion from the survey were filled using the maximum frequency value from the surrounding eight cells. Since some island regions (Ulleungdo and Dokdo) showed heterogeneous values because of their distance from the adjacent grid cells, mapping was performed using type values within the local range.
The correlations in species composition between the floristic zones were analyzed using Venn diagrams (in the ‘VennDiagram’ package) based on lists of species in each zone (Chen and Boutros, 2011). After producing species catalogs for each zone, the common taxa (those appearing in all zones) and specific taxa (those appearing in only specific zones) were distinguished. Then, floristic compositions were investigated by analyzing the identification of specific taxa at the family level.

Environmental data and analysis

Geographic and climate factors were analyzed as macro-environmental factors, using the defined floristic zones. For geographic factors, the latitude and longitude were used, and for climate factors, air temperature and precipitation data—provided by the Korean Meteorological Administration (2020) and collected from 583 points between 1970 and 2010—were used. In addition to the direct environmental data, the warmth index (WI) and coldness index (CI) were calculated and used as indirect climate data (Kira, 1945) (Eq. 1 and 2). The values for these environmental factors were converted to values covering the whole southern part of the Korean Peninsula by linear interpolation, accounting for topography and altitude, with ArcGIS program (ver. 10.0). The mean values for the environmental factors in each grid cell were then calculated and used in the analysis.
\begin{equation} \text{WI}=\sum_{1}^{n}{\left(t-5\right):t>5}\nonumber \\ \end{equation}\begin{equation} \text{CI}=-\sum_{1}^{n}{\left(t-5\right):t<5}\backslash n\nonumber \\ \end{equation}
As physical factors affecting plant distribution, parent materials, topography, effective soil depth, and soil texture for the southern part of the Korean Peninsula were used (Rural Development Administration, 2010). Parent materials were categorized as acidic rock, metamorphic rock, sedimentary rock, quaternary deposit, volcanic ash, and other; topography was categorized into mountain, hill, pediment, interrill area, fan, lava terrace, or other; effective soil depth was categorized into four classes (<20 cm, 20–50 cm, 50–100 cm, and >100 cm), and soil texture was categorized as sandy gravel, silt and sandy loam, clay loam, and clay (Appendix A4).
To test the effect of environmental factors on the floristic composition and zonation, the geographic and climate (mean annual temperature, annual precipitation, warmth index, and coldness index) data were analyzed using the one-way analysis of variance (ANOVA) and Tukey’s tests (Zar, 1984). The categorical physical factors (parent materials, topography, effective soil depth, and soil texture) were analyzed using box plots for each zone. The “ggplot2” R package was used for data visualization (Wickham, 2016). Statistical analyses were performed using R (R Core Team, 2019).