Introduction
Using biota to investigate biogeographic regions, including their
spatial range and distribution patterns, is key to providing a better
understanding of the ecological processes that create biotic
differentiation and biodiversity at multiple spatiotemporal scales
(Ricklefs, 2004). In particular, as an important step in understanding
the spatial structure of biodiversity, delineating the spatial range of
these regions has been a foundational part of basic and applied research
in biogeography, ecology, earth science, and conservation ecology (Brum
et al., 2017; Graham and Hijmans, 2006; Ibanez-Erquiaga et al., 2018;
Kreft and Jetz, 2010; Lenormand et al., 2019; Olson et al., 2001;
Ricklefs, 2004; Sun et al., 2008). Until the late 20thcentury, the proposed biological regions were based on limited data,
convenience, and the opinions of experts (White, 1983). Without distinct
criteria, floristic zones were also suggested based on delineating the
distribution of endemic plants (Takhtajan, 1986), or accumulating
floristic checklist data and applying spatial statistics (McLaughlin,
1989). Relatively recently, there have been many proposals for defining
geographical spaces based on plants (Gonzalez-Orozco et al., 2014; Kreft
and Jetz, 2010; Lenormand et al., 2019; Vilhena and Antonelli, 2015).
Difficulties exist with regard to producing precise mapping and
understanding the regional patterns of biodiversity because of issues
surrounding the fidelity and reliability of floristic surveys, and
because flora is closely related to environmental gradients (development
and climate) and their complexity (Gonzalez-Orozco et al., 2014). If
these challenges can be overcome, the delineation of biological spaces
based on high-fidelity, reliable data will provide new metrics and
perspectives through biogeographic regionalization (Lenormand et al.,
2019).
The first attempt at biogeographic regionalization of the Korean
Peninsula, made around 100 years ago, proposed convenient northern,
central, and southern divisions based on descriptions of plant and
vegetation characteristics (Nakai, 1919). Subsequently, more finely
differentiated floristic zones were defined by redistributing the
divisions of Nakai (1919) (Lee and Yim, 1978). Other authors proposed
pseudo-floristic zones from the perspective of vegetation and climate
(Yim and Kira, 1975). All these previous floristic zones for the Korean
Peninsula (including pseudo-floristic zones), which were mostly
developed from expert opinion or for convenience, have been structured
around homogeneous bands based on the relationship between latitude and
mean annual air temperature. In the neighboring country China,
large-scale banded or planar vegetation-climate zones have been
delineated using a wetness index (Sun et al., 2008). However, regions
with a complex mountainous structure show marked changes in elevation
and topography over short distances and, when this is combined with
human influence, it results in especially complicated
spatiotemporally-driven biogeographic regions (see the map of floristic
zones on the Korean Peninsula in Appendix A1) (Lenormand et al., 2019).
Given the lack of regionalization based on biological distribution data
(e.g., specimens), pseudo-biogeographical divisions have also been
developed for conservation at large spatial scales (Olson et al., 2001;
Sun et al., 2008). When delineating biogeographical zones, it is
necessary to maximize differences between zones while also maximizing
the homogeneity of the taxonomic assemblages within them (Stoddart,
1992). Improving the analytical accuracy through the quantitative
accumulation of organism distribution data, informatization of
geography, and large-scale distribution data—which has previously been
a challenge for the extraction of floristic and other biological
zones—enables quantitative and rigorous regionalization (Linder et
al., 2012). In recent studies, floristic zones have been delineated
using accumulated data to incorporate information about plant
distribution (Gonzalez-Orozco et al., 2014). The reliability of point
data for the distribution of organisms can only be ensured by using
specimen data. Because of the use of global positioning systems (GPS),
plant specimens that include spatial information are being collected.
From the late 20th century, accurate and extensive
plant catalogs and data on the distribution of specimens have been
collected (e.g., Korea National Arboretum, 2016), and floristic regions
are being defined at the regional and national level using plant
location data (Gonzalez-Orozco et al., 2014; Korea National Arboretum,
2016; Lenormand et al., 2019). It is, therefore, possible to develop
approaches to delineate floristic zones using these data, reliably and
accurately. In particular, in the southern part of the Korean Peninsula,
the plant specimens that have been collected and the distribution maps
that have been composed since 2000 can be evaluated with reliable
large-scale data.
To analyze accumulated species distribution data, artificial neural
networks (ANNs) are increasingly used as an alternative to traditional
statistics for the analysis of multi-dimensional data (Chon, 2011;
Cottrell et al., 2018; Snedden, 2019). Specifically, self-organizing
maps (SOMs), an ANN-based technique using unsupervised learning, are
suggested as an alternative to conventional primary component analysis
(Ahn et al., 2018; Chon, 2011). Essentially, SOMs are classified as a
non-linear sequence analysis method, since the training data set is
non-linearly projected onto a lesser dimensional space (generally two
dimensions) approximating the probability density function (Kohonen,
1995; Snedden, 2019). Unlike statistical approaches using mediator
variables, SOMs do not make assumptions related to the correlations
between variables or the distribution of variables (Chon, 2011; Giraudel
and Lek, 2001; Snedden, 2019), and so are suitable for use with species
presence or absence data (Céréghino et al., 2005). Efforts to derive and
visualize zones from plant species distribution data using conventional
univariate statistical analyses have generally assumed the response of
species data to environmental gradients, in accordance with Eigen-based
analytical approaches, but these analyses are limited in cases when the
shape of the species-abundance response (e.g., linear or unimodal) is
not clear (Ahn et al., 2018; Liu et al., 2006; Snedden, 2019). Recently,
numerous collections of distribution point data have been used for
biological regionalization, but it is not possible to ascertain the
relationships between species as variables. SOMs reduce multidimensional
data to two or three dimensions, making them useful for typification
analyses using distribution points for a large number of plant species
(including regionalization).
Quaternary glacial–interglacial oscillations have been an important
mechanism in shaping the current distribution of plants (Ricklefs,
1987). The Korean Peninsula, which is topographically composed of around
80% mountains, is characterized by a backbone mountain range (the
Baekdudaegan Mountains) running north to south with sub-ranges branching
off. The major mountains (≥1000 m above sea level) are considered to be
a single glacial refugium based on altitude rather than latitude, and
they have a mixture of boreal and temperate flora (Chung et al., 2017b;
Chung et al., 2018; Kim et al., 2014). The botanical importance of
peninsulas and mountainous regions is well established because of their
topographic characteristics (Médail and Diadema, 2009). The mountains
that form the core topography of the Korean Peninsula possess a
floristic composition that has been affected by latitude and
spatiotemporal gradients, and this might be significant for its mutually
distinct functions and evolutionary spaces. The flora of the Korean
Peninsula is fundamentally controlled by a mixture of boreal and
temperate abiotic conditions, and is affected, like other regions, by
the agricultural and urbanization activities of humans.
The accumulation of a large volume of recent distribution point data and
the application of analytical methods suited to the nature of the data
have resolved the previous difficulties of floristic regionalization,
making it possible to propose rigorous floristic zones according to the
actual distribution of species. To date, however, it has been difficult
to find case studies of accumulated, high-resolution, georeferenced
specimen data for plants, or to find studies related to geographic
regionalization using a SOM. The present study was conducted with the
purpose of redefining the floristic zones in the southern part of the
Korean Peninsula with SOMs and understanding the eco-evolutionary
significance of the spatial distribution patterns. We used point
distribution data for vascular plants collected at high resolution in
the southern part of the Korean Peninsula between 2003 and 2015. We
aimed to (1) derive floristic delimitations, (2) identify the
correlations with ecologically important environmental factors, and (3)
discuss the eco-evolutionary significance of the derived regions for
floristic assemblages.