Materials and methods
Sampling and DNA preparation
Fieldwork was carried out in the Lisbon Peninsula, where we searched for water bodies containing marbled newts. We specifically extended our search to include the Serra de Sintra mountains in the southwest of the peninsula, because of the observed presence of T. marmoratus at higher altitudes than T. pygmaeus in the northeastern section of the species’ hybrid zone (Arntzen & Espregueira Themudo, 2008). Twenty-five populations were found in an area spanning over 2000 km2, ranging from the Tejo River in the east and the south, the Atlantic Ocean in the south and the west, up to the city of Caldas da Rainha in the north, where our survey marginally overlapped with the area investigated by Espregueira Themudo et al. (2007). Adult and larval newts were captured by dip netting or with funnel traps. To reduce sampling bias e.g. towards siblings from particular breeding pairs (Goldberg & Waits, 2010) we made an effort to include all accessible sections of the water bodies. Tail tip tissue samples were collected and stored in 96% ethanol. The sampling was complemented with material from seven localities obtained earlier (Espregueira Themudo & Arntzen, 2007). DNA extraction of tissue samples followed the KingfisherTM (Thermo Scientific) and DNeasy extraction kit (Qiagen, Valencia, CA, USA) standard protocols.
SNP marker design
Species diagnostic SNPs were identified based on transcriptome data for one adult male T. marmoratus from San Pedro da Cova (coordinates 41.157 N, 8.496 W) and one adult male T. pygmaeus from Umbria, Serra de Monchique (coordinates 37.335 N, 8.506 W) that were sequenced at ZF-screens, Leiden on the Illumina HiSeq 2000 platform. The transcriptome libraries are available from Wielstra, McCartney-Melstad, Arntzen, Butlin, and Shaffer (2019), through the NCBI SRA at BioProject PRJNA498336 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA498336). Data filtering was carried out with Trimmomatic v0.36 (Bolger, Lohse, & Usadel, 2014) and the de novo transcriptome assembly with Trinity v2.5.1 (Grabherr et al ., 2011). SNP marker design followed the MIPs pipeline that encompasses advantages for targeted resequencing, including high specificity, a high level of multiplexing and no ascertainment bias (Hardenbol et al., 2003; Niedzicka, Fijarczyk, Dudek, Stuglik, & Babik, 2016). The SNP design process followed van Riemsdijk, Butlin, Wielstra, and Arntzen (2019); for details, see Supplemental Information I. The Xenopus tropicalis (Gray, 1864) genome obtained from Biomart ENSEMBL (genome version JGI4.2) was selected as a reference. Both marbled newts’ transcriptomes were blasted onto gene models of X. tropicalis in order to extract unambiguously mapping exon sequences for SNP discovery. SNP identification was performed in Mesquite v.3.40 after performing an alignment with Muscle v.3.8.31.
SNP detection and validation
SNP genotyping took place at the Institute of Biology Leiden / Naturalis SNP line facility using the Kompetitive Allele-Specific PCR (KASP) genotyping system (LGC genomics, UK). KASP is a fluorescence-based method determining the bi-allelic score of SNPs in uniplex assays. KASP is based on two allele-specific primers with a final base complementary to one of the two potential SNPs and unique tail sequence (Semagn, Babu, Hearne, & Olsen, 2014). The KASP master mix contains different fluorescent-labelled primers that become activated during PCR cycles, with the fluorescent signal increasing as more fluorescent primers are incorporated during the thermocycling of the PCR reaction. Primers were designed using the Kraken software and ordered from Integrated DNA Technologies (Wood & Salzberg, 2014).
For SNP validation, we used 120 T. pygmaeus and 118 T. marmoratus from 43 populations across the Iberian Peninsula, located outside the documented hybrid zone of these species (Arntzen, 2018; Figure 1 and Table S.1). The validation assay of 192 SNPs resulted in 147 markers being polymorphic, of which 81 were deemed species diagnostic for the reference samples. For the Lisbon Peninsula 354 individuals from 32 populations (25 new and 7 previously studied populations) were KASP genotyped for the 60 most promising nuclear SNPs. A further four primer sets were developed from the sequence information provided by Espregueira Themudo, Nieman, and Arntzen (2012), for the nuclear genes ß-Fibrinogen intron 7 (BF), Calreticulin intron C (CC) and Platelet-derived growth factor receptor α intron 11 (PDG) and for the mitochondrial gene NADH dehydrogenase subunit 4 (ND4).
Population genetics
Hardy-Weinberg equilibrium and genotypic disequilibrium among pairs of nuclear loci were tested with the R package GENEPOP v1.0.5, under the Benjamini-Hochberg correction (Benjamini & Hochberg, 1995; Rousset, 2008). Gene flow between genetically distinct populations produces admixture linkage disequilibrium among those loci that have different allele frequencies in the founding populations (Pfaff et al. , 2001). Admixture linkage disequilibrium was estimated following Barton & Gale (1993) and was based on 1,000 bootstrap replicates of the original dataset, using a script from van Riemsdijk, Butlin, Wielstra, and Arntzen (2019). The STRING v.10.5 protein-protein interaction network database (Szklarczyk et al ., 2015) was consulted to examine the functional linkage among the annotated nuclear markers, with reference to the X. silurana genome.
The SNP data were analysed with Structure software under the ‘admixture model’, given that neighbouring populations can interbreed (Pritchard, Stephens, & Donnelly, 2000; Falush, Stephens, & Pritchard, 2003). As the genotype pool could only be biallelic corresponding to the species to be analysed, the number of potentially differentiated gene pools was predetermined as K =2. The program was run for 100,000 generations after 100,000 generations of burn-in. Individuals were classified as pure T. marmoratus (Q < 0.01), or pure T. pygmaeus (Q > 0.99) or admixed (0.01 <Q < 0.99).
Environmental modelling
Species distribution models were constructed by the comparison of presence-only data for both species under reference to contemporary climate conditions. The biological data employed were 108 T. marmoratus and T. pygmaeus records that were supported by molecular species identification (Arntzen, 2018; present paper). The records for three genetically admixed populations from Portugal and Spain and three T. marmoratus populations from France were excluded. Explanatory variables were the 19 climate parameters of WorldClim 2.0 (bio01-bio19; see Fick and Hijmans, 2017). To identify and subsequently reduce collinearity, we constructed a half-matrix of pairwise Spearman correlation coefficients (rs). This matrix was subjected to clustering with UPGMA in PRIMER-e software (Clarke & Gorley, 2006). Parameters bio14 and bio15 were deemed unavailable for selection because of their regular high discrepancy between data sets (Varela, Lima-Ribeiro, & Terribile, 2015). Under the criterion of partial independence atr s<0.7, the parameters retained were bio01, bio02, bio03, bio05, bio06, bio08, bio12 and bio17 (Supplementary Information II). Logistic regression analyses were performed with SPSS 20 (IBM SPSS, 2016) with ‘species’ as the dependent variable. Parameter selection was in the forward stepwise mode under the criteria of Pin=0.05 and Pout=0.10 under the likelihood ratio criterion. Model fit was assessed by the area under the curve (AUC) statistic. The resulting two-species distribution model was then applied to climate reconstructions for the mid-Holocene and the Last Glacial Maximum (WorldClim version 1.4; Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). In the absence of firm guidance of which climate reconstruction would be most appropriate to apply (Guevara, Morrone, & León‐Paniagua, 2019), distribution models were derived for all of them (i.e. nine models for the mid-Holocene and three models for the Last Glacial Maximum). Distribution models were visualized with ILWIS (ILWIS, 2009).