Phylogenomics
Total genomic DNA was extracted from herbarium specimens using a modified CTAB protocol (Doyle and Doyle, 1987). Nuclear target enrichment was used to capture 260 low- to single-copy nuclear (LSCN) genes using RNA baits design for Dioscorea (Soto Gomez et al., 2019). Genomic libraries were prepared using NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (New England Biolabs, Ipswich, Massachusetts, USA) with AMPure XP magnetic beads and NEBNext® Multiplex Oligos for Illumina® (Dual Index Primer Sets I and II) as tags for simultaneous sequencing. Subsequently, the enriched libraries were multiplexed and sequenced on a HiSeq X platform (Illumina, Inc.) lane.
We filtered raw paired-end reads by removing adapter sequences and low-quality reads using Trimmomatic v.0.36 (Bolger et al., 2014). We used HybPiper v.1.3.1 (Johnson et al., 2016) to recover 260 nuclear genes and associated introns (Soto Gomez et al., 2019) using sequence data from the transcriptome of D. communis (SRA SAMN11290810) as a reference file following Viruel et al. (2019). We used nQuire (Weiß et al ., 2018) to calculate the number of read counts for each allele per single nucleotide polymorphism (SNP), and estimated the median value of allelic ratios per sample to classify each individual as diploid (<2) or polyploid (>2) as described and optimized for Dioscorea in Viruel et al.(2019). The percentage of polymorphic sites was calculated as the percentage of SNP positions compared to the total number of base pairs retrieved for each sample. Plastome data were recovered using HybPiper and the plastome of D. elephantipes as reference (GenBank NC_009601).
Sequences were aligned with MAFFT v.7 (Katoh, et al., 2002) using the –auto parameter, and debugged with trimAl v.1.4.1 (Capella-Gutiérrez et al., 2009) using the -automated1command. Phylogenomic trees were reconstructed using the concatenated nuclear DNA (nDNA) and plastid DNA (pDNA) datasets independently, and for each nuclear gene independently, using RAxML-NG (Katoh et al., 2019) and IQ-TREE (Nguyen et al., 2015), with a GTR+GAMMA substitution model and 1,000 bootstrap replicates. We used ASTRAL‐III (Zhang et al., 2018) to construct a species tree based on the independent nuclear gene trees, and SVDquartets to evaluate 10,000,000 random quartets -or all possible quartets if lower- and 10,000 bootstrap replicates, as implemented in PAUP* 4.0a146 (Swofford, 2002). Haplotype networks were reconstructed with plastid data using the TCS method as implemented in Popart v1.7 (Clement et al., 2002; Leigh and Bryant, 2015).
We used Structure (Pritchard et al., 2000) to further investigate the genetic clusters within and between taxa based on filtered SNP data from the concatenated nDNA dataset. We tested one to six genetic groups (K =1–6) allowing admixture at individual level, and correlated allele frequencies, by running five replicates with a 100,000 burn-in and a chain length of 1,000,000 simulations each. Structure Harvester (Earl and vonHoldt, 2012) was used to obtain likelihood values for the multiple values of K and to apply the delta K criterion to select the optimal K . We plotted Structure results for eachK value using StructuRly (Criscuolo and Angelini, 2020).
Divergence times were estimated using a Bayesian relaxed-clock approach implemented in BEAST 1.10.4 (Drummond and Rambaut, 2007) and a penalized likelihood approach as implemented in treePL (Smith and O’Meara, 2012) using the concatenated nDNA dataset containing one representative per taxon. In both analyses, the crown node of the African/Mediterranean clade was used as calibration by applying a minimum age of 24 MY and a maximum age of 40 MY, based on the age estimates from Viruel et al. (2016; 95% HPD of 24.3469–39.2223 MY). We applied the GTR+I+G substitution model, Yule tree prior, and an uncorrelated lognormal molecular clock and ran the analysis for 1 billion generations, sampling every 100,000 generations. The treePL analysis was conducted in two consecutive runs: i) applying the “prime” option to select the most optimal parameter values, and ii) a “thorough” analysis by settingopt = 1, optad = 2 and optcvad = 5.