Phylogeny, occurrence records, and niche models.
We obtained a dated phylogeny for all seed plants from Smith and Brown (2018; ALLMB phylogeny) and left polytomies unresolved. This phylogeny generated a species list with which to query American specimen records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). Records were then cleaned and filtered using the BiotaPhy Platform interface (https://biotaphy.github.io), following their accepted best practices.
The full GBIF dataset (Nrecords=36,335,199) is described and accessible at (https://doi.org/10.15468/dl.gtgtt5). Briefly, GBIF records with the following flags were removed: TAXON-MATCH_FUZZY, TAXON_MATCH_HIGHER_RANK, TAXON_MATCH_NONE. Further processing was performed after aggregating GBIF and iDigBio records. For iDigBio, data cleaning and filtering produced a dataset of 13,667,523 records (Ninitial=58,384,427; 23.4% retained). Briefly, initial records were filtered by removing those with any of the following flags: GEOPOINT_DATUM_MISSING, GEOPOINT_BOUNDS, GEOPOINT_DATUM_ERROR, GEOPOINT_SIMILAR_COORD, REV_GEOCODE_MISMATCH, REV_GEOCODE_FAILURE, GEOPOINT_0_COORD, TAXON_MATCH_FAILED, DWC_KINGDOM_SUSPECT, DWC_TAXONRANK_INVALID, DWC_TAXONRANK_REMOVED. Full details are provided in the Dryad deposit associated with this study (https://doi.org/10.5061/dryad.9cnp5hqgx).
Aggregated GBIF and iDigBio records were then further processed by excluding points with any of the following issues: (1) falling outside the study area (the Americas); (2) less than four decimal point precision (~11 m near the equator); (3) duplicate localities (rarefaction); (4) falling outside polygons describing accepted species’ distributions (defined by Plants of the World Online, POWO; Brummitt 2001; www.github.com/tdwg/wgsrpd); (5) species with fewer than twelve records (to build reliable niche models).
Cleaned records were then passed to MaxEnt (version 3.1.4; Phillips, Anderson, Schapire, 2006) along with 2.5’ resolution climate data from WorldClim (Fick and Hijmans, 2017) to build species distribution models (SDMs). We chose to perform our analyses using SDMs rather than point occurrence records for two reasons. SDMs offer a probabilistic way of describing expected species’ ranges based on the climate from sites where the species has been observed. In this way, SDMs convert presence/ absence data into a continuously valued function, allowing us to ask how distributions are impacted by abiotic factors without having to arbitrarily bin species, as for example, alpine or montane. Second, using SDMs helps overcome some sampling limitations by providing insight into the climatic tolerances of where species might occur, even if they have not been sampled at that precise location (Barthlott et al. 2007; Meyer, Kreft, Guralnik, Jetz, 2015; Brummitt, Araújo, Harris, 2021). Although this could lead to erroneously predicting, for example, that a northern boreal species should occur at extreme southern latitudes, we overcome this obstacle by masking the SDMs with polygons provided by POWO that define geographically broad areas where each species occurs based on expert assessments. This approach thus constrained SDMs by both known areas of occurrence and climatic tolerances.