Introduction

Using biota to investigate biogeographic regions, including their spatial range and distribution patterns, is key to providing a better understanding of the ecological processes that create biotic differentiation and biodiversity at multiple spatiotemporal scales (Ricklefs, 2004). In particular, as an important step in understanding the spatial structure of biodiversity, delineating the spatial range of these regions has been a foundational part of basic and applied research in biogeography, ecology, earth science, and conservation ecology (Brum et al., 2017; Graham and Hijmans, 2006; Ibanez-Erquiaga et al., 2018; Kreft and Jetz, 2010; Lenormand et al., 2019; Olson et al., 2001; Ricklefs, 2004; Sun et al., 2008). Until the late 20thcentury, the proposed biological regions were based on limited data, convenience, and the opinions of experts (White, 1983). Without distinct criteria, floristic zones were also suggested based on delineating the distribution of endemic plants (Takhtajan, 1986), or accumulating floristic checklist data and applying spatial statistics (McLaughlin, 1989). Relatively recently, there have been many proposals for defining geographical spaces based on plants (Gonzalez-Orozco et al., 2014; Kreft and Jetz, 2010; Lenormand et al., 2019; Vilhena and Antonelli, 2015). Difficulties exist with regard to producing precise mapping and understanding the regional patterns of biodiversity because of issues surrounding the fidelity and reliability of floristic surveys, and because flora is closely related to environmental gradients (development and climate) and their complexity (Gonzalez-Orozco et al., 2014). If these challenges can be overcome, the delineation of biological spaces based on high-fidelity, reliable data will provide new metrics and perspectives through biogeographic regionalization (Lenormand et al., 2019).
The first attempt at biogeographic regionalization of the Korean Peninsula, made around 100 years ago, proposed convenient northern, central, and southern divisions based on descriptions of plant and vegetation characteristics (Nakai, 1919). Subsequently, more finely differentiated floristic zones were defined by redistributing the divisions of Nakai (1919) (Lee and Yim, 1978). Other authors proposed pseudo-floristic zones from the perspective of vegetation and climate (Yim and Kira, 1975). All these previous floristic zones for the Korean Peninsula (including pseudo-floristic zones), which were mostly developed from expert opinion or for convenience, have been structured around homogeneous bands based on the relationship between latitude and mean annual air temperature. In the neighboring country China, large-scale banded or planar vegetation-climate zones have been delineated using a wetness index (Sun et al., 2008). However, regions with a complex mountainous structure show marked changes in elevation and topography over short distances and, when this is combined with human influence, it results in especially complicated spatiotemporally-driven biogeographic regions (see the map of floristic zones on the Korean Peninsula in Appendix A1) (Lenormand et al., 2019).
Given the lack of regionalization based on biological distribution data (e.g., specimens), pseudo-biogeographical divisions have also been developed for conservation at large spatial scales (Olson et al., 2001; Sun et al., 2008). When delineating biogeographical zones, it is necessary to maximize differences between zones while also maximizing the homogeneity of the taxonomic assemblages within them (Stoddart, 1992). Improving the analytical accuracy through the quantitative accumulation of organism distribution data, informatization of geography, and large-scale distribution data—which has previously been a challenge for the extraction of floristic and other biological zones—enables quantitative and rigorous regionalization (Linder et al., 2012). In recent studies, floristic zones have been delineated using accumulated data to incorporate information about plant distribution (Gonzalez-Orozco et al., 2014). The reliability of point data for the distribution of organisms can only be ensured by using specimen data. Because of the use of global positioning systems (GPS), plant specimens that include spatial information are being collected. From the late 20th century, accurate and extensive plant catalogs and data on the distribution of specimens have been collected (e.g., Korea National Arboretum, 2016), and floristic regions are being defined at the regional and national level using plant location data (Gonzalez-Orozco et al., 2014; Korea National Arboretum, 2016; Lenormand et al., 2019). It is, therefore, possible to develop approaches to delineate floristic zones using these data, reliably and accurately. In particular, in the southern part of the Korean Peninsula, the plant specimens that have been collected and the distribution maps that have been composed since 2000 can be evaluated with reliable large-scale data.
To analyze accumulated species distribution data, artificial neural networks (ANNs) are increasingly used as an alternative to traditional statistics for the analysis of multi-dimensional data (Chon, 2011; Cottrell et al., 2018; Snedden, 2019). Specifically, self-organizing maps (SOMs), an ANN-based technique using unsupervised learning, are suggested as an alternative to conventional primary component analysis (Ahn et al., 2018; Chon, 2011). Essentially, SOMs are classified as a non-linear sequence analysis method, since the training data set is non-linearly projected onto a lesser dimensional space (generally two dimensions) approximating the probability density function (Kohonen, 1995; Snedden, 2019). Unlike statistical approaches using mediator variables, SOMs do not make assumptions related to the correlations between variables or the distribution of variables (Chon, 2011; Giraudel and Lek, 2001; Snedden, 2019), and so are suitable for use with species presence or absence data (Céréghino et al., 2005). Efforts to derive and visualize zones from plant species distribution data using conventional univariate statistical analyses have generally assumed the response of species data to environmental gradients, in accordance with Eigen-based analytical approaches, but these analyses are limited in cases when the shape of the species-abundance response (e.g., linear or unimodal) is not clear (Ahn et al., 2018; Liu et al., 2006; Snedden, 2019). Recently, numerous collections of distribution point data have been used for biological regionalization, but it is not possible to ascertain the relationships between species as variables. SOMs reduce multidimensional data to two or three dimensions, making them useful for typification analyses using distribution points for a large number of plant species (including regionalization).
Quaternary glacial–interglacial oscillations have been an important mechanism in shaping the current distribution of plants (Ricklefs, 1987). The Korean Peninsula, which is topographically composed of around 80% mountains, is characterized by a backbone mountain range (the Baekdudaegan Mountains) running north to south with sub-ranges branching off. The major mountains (≥1000 m above sea level) are considered to be a single glacial refugium based on altitude rather than latitude, and they have a mixture of boreal and temperate flora (Chung et al., 2017b; Chung et al., 2018; Kim et al., 2014). The botanical importance of peninsulas and mountainous regions is well established because of their topographic characteristics (Médail and Diadema, 2009). The mountains that form the core topography of the Korean Peninsula possess a floristic composition that has been affected by latitude and spatiotemporal gradients, and this might be significant for its mutually distinct functions and evolutionary spaces. The flora of the Korean Peninsula is fundamentally controlled by a mixture of boreal and temperate abiotic conditions, and is affected, like other regions, by the agricultural and urbanization activities of humans.
The accumulation of a large volume of recent distribution point data and the application of analytical methods suited to the nature of the data have resolved the previous difficulties of floristic regionalization, making it possible to propose rigorous floristic zones according to the actual distribution of species. To date, however, it has been difficult to find case studies of accumulated, high-resolution, georeferenced specimen data for plants, or to find studies related to geographic regionalization using a SOM. The present study was conducted with the purpose of redefining the floristic zones in the southern part of the Korean Peninsula with SOMs and understanding the eco-evolutionary significance of the spatial distribution patterns. We used point distribution data for vascular plants collected at high resolution in the southern part of the Korean Peninsula between 2003 and 2015. We aimed to (1) derive floristic delimitations, (2) identify the correlations with ecologically important environmental factors, and (3) discuss the eco-evolutionary significance of the derived regions for floristic assemblages.