2. Material and Methods

2.1. Species complexes

Six species complexes showing discrepancies between morphological and COI-based identifications in the Swiss bee fauna were selected, mainly based on the comprehensive study of Schmidt et al. (2015). All reported cases of mitochondrial introgression in Switzerland were included, with the exception of Andrena nitida/limita and Colletes hederae/succinctus , for which not enough material was available. The following section provides information on the six species complexes.

2.1.1. Species complex 1: Andrena allosa/amieti/bicolor/montana

Species delimitation within this group (hereafter “bicolor -group”) has long remained controversial, especially in the alpine region where two taxa with debated status (F Amiet, Hermann, Müller, & Neumeyer, 2010) co-occur with A. bicolor , namelyA. allosa and A. montana . Phylogenetic analyses on a mitochondrial (COI) and a nuclear gene (LW-rhodopsin) confirmed the taxon validity of both A. allosa and A. montana , but also revealed the presence of a new, until then undescribed cryptic species,A. amieti (Praz et al., 2019). These analyses also revealed two sympatric clades for A. bicolor (also reported in Schmidt et al., 2015) and two sympatric clades for A. amieti . Genetic distances between these sympatric clades were comparatively high, with approximately 3.7% and 2.4% for A. bicolor and A. amieti , respectively, and hence comparable to distances among valid species in this group. In addition, the two sympatric clades withinA. amieti formed a paraphyletic unit from which the distinct species A. allosa arose. Whether the mitochondrial clades found in A. bicolor or A. amieti represent additional cryptic species within this group, remains unclear.
To complement the dataset, two specimens collected in Greece and probably representing an undescribed species were included in this species complex; this species is referred to as Andrena sp3, as in Praz et al. (2019).

2.1.2. Species complex 2: Andrena barbareae/cineraria

A. barbareae and A. cineraria are sibling species, morphologically very close, although identifiable in most cases by a combination of characters in both genders (Amiet et al. 2010). A. cineraria has a wider distribution than A. barbareae , which is mainly restricted to the Alps. Both species are polylectic but exhibit different phenologies, with two generations for A. barbareae and one for A. cineraria . Because of their morphological, biogeographical and phenological differences, both species are generally considered as separated although this view was recently challenged because both taxa share identical barcodes (Schmidt et al., 2015).

2.1.3. Species complex 3: Andrena carantonica/trimmerana/rosae

Taxonomical delimitation in this species complex is a long-standing enigma with controversial species delimitation hypotheses due to morphologically divergent generations and unclear morphological differentiation. First, Andrena rosae and A. stragulataare considered by most authors to represent the summer and spring generations of the same species (Falk, 2016; Reemer, Groenenberg, Van Achterberg, & Peeters, 2008; Westrich, 2014). Nevertheless, both taxa exhibit differences in terms of morphology, pollen collecting behaviour and, possibly, nesting sites (Amiet et al., 2010; van der Meer, Reemer, Peeters, & Neve, 2006; Westrich, 2014). Second, morphological differentiation of A. carantonica and A. trimmerana is challenging. No clear morphological character allows to separate females; and while males of the first generation of A. trimmeranadiffer in the morphology of the mandible, those of the summer generation cannot be differentiated from those of A. carantonica . Both taxa exhibit distinct phenologies with only one generation for A. carantonica (April to May) and two for A. trimmerana(March-April, June-July), although isolated late-summer specimens ofA. carantonica are known (C. Praz and R. Paxton, unpublished data). Both taxa overlap in their distribution area, but A. carantonica is much more abundant than A. trimmerana, for which only a few occurrences were reported in Switzerland or Germany.

2.1.4. Species complex 4: Andrena dorsata/propinqua

Depending on the author, A. dorsata and A. propinqua are considered separate or conspecific taxa (Amiet et al., 2010; Gusenleitner & Schwarz, 2002; Schmid-Egger & Scheuchl, 1997). Morphologically, the separation of both taxa is complicated and subject to errors, especially at the European scale. At the genetic level, two distinct clades mostly corresponding to the two taxa were recovered, however, both species were previously found to share COI, suggesting that identification errors could underlie the observed barcode sharing (Schmidt et al., 2015).

2.1.5. Species complex 5: Lasioglossum alpigenum/bavaricum/cupromicans

Species delimitation in this complex is generally accepted based on clear differences in male genital morphology (Amiet, Herrmann, Müller, & Neumeyer, 2001; Edmer, 1970). Identification of females is however challenging, and L. bavaricum and L. cupromicans were recently suggested to share the same COI barcode. However, since no male of L. bavaricum had been sequenced, identification of the sequenced specimens for this taxon remain tentative.

2.1.6. Species complex 6: Nomada goodeniana/succincta

The morphological separation between these two species is mainly relying on colour patterns and is therefore prone to identification errors, although both appear to differ in their hosts, phenology and possibly in the chemical composition of mandibular glandular secretions (Kuhlmann, 1997). Schmidt et al. (2015) found two divergent clusters for N. succincta : a northern European cluster containing specimens of N. goodeniana and a southern European cluster. A similar result was found in England (Creedy et al., 2019). As described above for A. dorsata/propinqua , the reported COI barcode sharing between N. goodeniana and N. succincta could potentially be due to misidentification, at least in Germany, where a previous study suggested that both species did not share COI barcode (Diestelhorst & Lunau, 2008).

2.2. Sampling

Most specimens were sampled across Switzerland in the frame of the “Red List of Swiss bee” project between 2008 and 2019. Additional specimens were collected in France, Italy and Greece. For the A. bicolorcomplex, sites known to harbour large populations or several species/clades in sympatry were additionally sampled in 2018. Bees collected within the red list surveys were killed in ethyl acetate, pinned and preserved dry. Samples collected in 2018 were preserved in 70% EtOH at 4°C to ensure good DNA preservation. All bees were morphologically identified by one of us (C. Praz); for A. carantonica/trimmerana , phenology was used in addition to morphology for identification. Further information on the sampling, as well as metadata and sampling maps are provided as Supplementary information (S1-S2); in addition, the COI sequence of every specimen used in this study has been uploaded onto the BOLD platform.

2.3. COI sequencing and phylogeny

The cytochrome oxidase unit I (COI) barcode fragment of all specimens was sequenced either by Sanger or NGS-barcode sequencing using the primer pairs LepF/LepR (Hebert, Penton, Burns, Janzen, & Hallwachs, 2004) or mlCOIntF/HCO (Leray et al., 2013) (following protocols in Gueuning et al., 2019). Raw sequences were imported into Geneious v11.0.5 and consensus sequences between forward and reverse sequences were constructed for each specimen using the Geneious assembler. Consensus sequences were aligned per species complex using MAFFT v7.308 (Katoh & Standley, 2013). The absence of stop codons within sequences was confirmed by translating and skimming the consensus sequences. Phylogenetic trees were built with RAxML v8.2.11 (Stamatakis, 2014) using the GTR GAMMA model and 100 bootstrap replicates. Since sequences were obtained using different methods (i.e. Sanger, NGS) and different primers, sequences were truncated to the same length (i.e. 265 bp).

2.4. UCE library preparation

Whole body DNA extractions were performed overnight in a proteinase K buffer at 56°C and purified using a Qiagen Biosprint 96 extraction robot following the manufacturer’s protocol. Extracts were quantified using Qubit v4 (Thermofisher Scientific) and 50 ng DNA per specimen were sonicated to 500 bp fragment length using a Bioruptor ultrasonicator (Diagenode). Two independent dual-indexed libraries containing each 96 specimens were constructed using a Kapa Hyper prep kit (Roche) using one fourth of the manufacturer’s recommended volumes (as described in Branstetter, Longino, Ward, & Faircloth, 2017). PCR amplifications were performed in the recommended volumes. PCR products were quantified using a Qubit v2 and each row of a 96-well PCR plate were pooled equimolarly (i.e. for total of 8 pools). Libraries were UCE enriched using the Hymenopteran v2 hybridization kit (UCE Hymenoptera 2.5Kv2 Principal/Full, myBaits, Arborbiosci). Each enrichment was performed on a single pool of 12 individuals using 500 ng. The enrichment protocol followed the manufacturer’s recommendations with a hybridization step of 24 h at 65°C, followed by a PCR amplification with 14 cycles. Pools were sequenced on a Miseq using five Illumina v3 kits (2 x 300 bp; Illumina, location, Switzerland).

2.3. Bioinformatic processing of UCE data

Fastq reads were demultiplexed on the Miseq and data from all runs were merged and processed mainly using PHYLUCE tools (Faircloth, 2016). Raw data were cleaned with illumprocessor (Faircloth, 2016), a tool wrapped around trimmomatic (Bolger, Lohse, & Usadel, 2014). Clean reads were assembled with SPAdes v3.12.0 (Nurk et al., 2013) using the single-cell flag (“–sc”), carful option (“–careful”) and a coverage cutoff value of five (“–cov-cutoff”). Obtained contigs were mapped against the corresponding UCE reference file using Lastz (Harris, 2007) and matching reads were extracted and aligned by species complex using MAFFT (Katoh & Standley, 2013). Since species within each complex are relatively close related (< 30-50 MYA), alignments were edge-trimmed and not internal-trimmed (Faircloth, 2016). Loci shared by less than 75% of the maximum number of specimens sharing a locus were filtered out. Remaining alignments were concatenated and saved in fasta format. An additional filtering step was applied to remove specimens with more than 90% missing data.

2.4. UCE analyses

The remaining concatenated alignments of the UCE amplicons were used for phylogenetic analyses. Maximum likelihood trees were produced for each species complex with RAxML v8.2.11 using the same parameters as for the COI RAxML trees. For comprehensiveness, two separate trees were produced for the “bicolor -goup” (i.e. one for A. amieti and one for A. bicolor ). Genetic distances between species/lineages for each species complex were computed with MEGA-x (Kumar, Stecher, Li, Knyaz, & Tamura, 2018) using the Tajima-Nei model.
Multivariate analyses and genetic distance tests were conducted in R, mainly using the adegenet package (Jombart, 2008). Sequences were first imported into R using the “fasta2genlight ” function which reads aligned sequences and extracts binary SNPs before converting files into a genlight object. After conversion, the SNPs matrices were filtered to retain only variable sites containing less than 15% missing data. Datasets were then screened for significant departure from Hardy-Weinberg equilibrium using the dartR package (Gruber, Unmack, Berry, & Georges, 2018). Principal component analyses were performed using the “dudi.pca ” function (ade4 package; Dray & Dufour, 2007) without scaling or centering the data. PCA results were plotted with ggplot2 (Wilkinson, 2011) using the two first components. To further identify and describe genetic clusters, discriminant analyses of principal components (DAPC) were performed. A first approach was used to verify the group’s membership using a prior knowledge on the species assignments. For the “bicolor -group”, A. bicolor andA. amieti were divided into two distinct groups (e.g. mitochondrial lineage 1 and 2; hereafter referred to as ML1 and ML2). The optimal number of PCs to retain was identified using both the plotted cumulative variance of the eigenvalues and a cross-validation method implemented in the “optim.a.score ” function. Results of posterior membership probabilities for each specimen were plotted using ggplot2. In a second approach, we ran a DAPC by grouping specimens into genetic clusters without species a priori knowledge. The function “find.clusters ” was used to determine the optimal number of genetic clusters which was defined as the solution harbouring the lowest Bayesian Information Criterion (BIC) value. Further, we computed for each species complex pairwise fixation indexes (Fst) between putative species (with A. amieti and A. bicolor treated as two lineages) using the dartR package with 10,000 permutations. Levels of observed genetic differences were tested using analyses of molecular variance (AMOVAs), using the dartR package with 10,000 permutations.
Because a potential pattern of isolation by distance (IBD) was observed in the phytogenic tree of A. amieti , correlation between genetic and geographical distance between sampling locations was computed using the “gl.ibd” function (dartR package) with 10,000 permutations. Genetic distances were computed upon Nei’s genetic distance (Nei, 1972) using the “stamppNeisD ” function from the StAMPP package (Pembleton, Cogan, & Forster, 2013) and geographical distances according to the “haversine method” using the geosphere package (Hijmans, Williams, & Vennes, 2019).
Finally, three independent analyses were performed for testing species delimitation. (I) First, we performed a Generalized Mixed Yule Coalescent model (GMYC) on ultrametric trees. Trees for each species complex were built with BEAST2 v2.5.2 (Bouckaert et al., 2014) using the JC69 substitution model and a strict molecular clock with a fixed rate of 1.0. Priors followed a yule model with a uniform distribution for “birthRate”. MCMC ran for 250 million generations with sampling every 1000 generations. Chain convergence was assessed using the software TRACER v1.6 (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). For computational purposes, trees were resampled to a total of 100 trees using the logCombiner software. The ultrametric trees were then imported into R using the ape package (Paradis & Schliep, 2019). GMYC was performed on the last tree using the splits package (Ezard, Fujisawa, & Barraclough, 2009). Interval of species number was set between 0 and 10 and the analysis was run using the single-threshold version. (II) Second, results from the first analyses were cross-validated using a Bayesian implementation of the GMYC model (bGMYC) (Reid & Carstens, 2012). For each species group, the maximum number of possible “species” was set as twice the number of expected species present in the species complex. The MCMC was set to 11,000 generations (1000 generations burnin) and sampled every 10 generations (“thinning”). (III) Third, we performed an analysis on the concatenated sequences using the Bayesian Phylogenetics and phylogeography model (BPP) (Yang & Rannala, 2010). The BPP analyses were ran using the A11 model (i.e. unguided species delimitation analysis; Yang & Rannala, 2014) on the nexus files (each corresponding to one UCE) obtained after the 75% threshold filtering step. The population file was designed so that specimens were assigned to their species. For the “bicolor -group”, based on the phylogeny trees,A. bicolor was divided into two distinct lineages and A. amieti in one. Alpha and beta parameters of the inverted gamma distribution of the theta prior (average proportion of different sites between two sequences) were set to 3 and 0.004, respectively. For the tau prior, alpha and beta were respectively set to 3 and 0.002. The analyses was run two times with a MCMC of 50,000 generations and with a 10% burnin period.