2.4.1. Structure and principal component analyses
ADMIXTURE v1.3.0 (Alexander & Lange, 2011) was run using the set of 5,702 SNPs and all samples to evaluate the genetic structure among populations. ADMIXTURE was run using the “unsupervised model” (without using sampling locality as a prior). With this model setting, ADMIXTURE evaluates Q- values to ancestral genetic clusters (K) in different runs, ranging from one (K=1) to the number of sampled populations + 1 (in our case K=21), and gives an optimal K as the one achieving the lowest cross-validation (cv) error. Furthermore, ADMIXTURE was also run using the subset of 381 SNPs (“unsupervised model”) and all populations to compare the genetic structure among populations suggested by the set of species-specific SNPs to the structure suggested by the complete set of SNPs. ADMIXTURE was run using K values (1 to 15), the reason for this was the reduction in the number of localities due to the use of all allopatric samples as a single fixed population for both species.
To visualize patterns of genetic differentiation among individuals and populations, we used principal component analyses (PCA) in the R package SNPRelate v1.6.4, function snpgdsPCA( ) (Zheng et al., 2012), using the set of 5,702 SNPs and all samples. This was used to determine whether individuals from the allopatric regions and the two hybrid regions correspond to different genetic clusters.
2.4.2 Genetic diversity and Hardy Weinberg equilibrium
To test whether potential hybridization and introgression resulted in increased genetic diversity in the Spanish hybrid regions, we compared different genetic diversity estimates for I. elegans and I. graellsii from the allopatric and the sympatric distribution in each locality, for all individuals before excluding the hybrids (F1 and F2 hybrids) and after excluding hybrids. We calculated number of alleles (A), allelic richness (Ar), observed heterozygosity (HO), expected heterozygosity (HE), and nucleotide diversity (π) using the 5,702 SNPs set. Genetic diversity was measured at both population and regional levels. Na, and Ar were estimated using the HIERFSTAT package v0.04-22 (Goudet, 2005) as implemented in R. Allelic richness was rarefacted for a minimum of eight alleles (or four diploid samples), which was the lowest sample size used for Hardy-Weinberg equilibrium testing. Meanwhile Ho and HE were calculated using PLINK v1.90b6.12 (Purcell et al., 2007), and π and the percentage of SNPs with missing data (NA) with VCFtools. Kruskal-Wallis tests and posthoc paired Wilcoxon tests were used to compare the levels of each diversity estimate between regions (north-west and north-central hybrid regions, and allopatry).
Population and regional tests for Hardy-Weinberg disequilibrium per locus were estimated using the 5,702 SNPs using PLINKs mid-p modifier (Graffelman & Moreno, 2013). Then, the ratios of SNPs at HW disequilibrium (p < 0.05) to the total number of genotyped SNPs, as well as average p -values per population were calculated.