2. Material and Methods
2.1. Species complexes
Six species complexes showing discrepancies between morphological and
COI-based identifications in the Swiss bee fauna were selected, mainly
based on the comprehensive study of Schmidt et al. (2015). All reported
cases of mitochondrial introgression in Switzerland were included, with
the exception of Andrena nitida/limita and Colletes
hederae/succinctus , for which not enough material was available. The
following section provides information on the six species complexes.
2.1.1. Species complex 1: Andrena
allosa/amieti/bicolor/montana
Species delimitation within this group (hereafter
“bicolor -group”) has long remained controversial, especially in
the alpine region where two taxa with debated status (F Amiet, Hermann,
Müller, & Neumeyer, 2010) co-occur with A. bicolor , namelyA. allosa and A. montana . Phylogenetic analyses on a
mitochondrial (COI) and a nuclear gene (LW-rhodopsin) confirmed the
taxon validity of both A. allosa and A. montana , but also
revealed the presence of a new, until then undescribed cryptic species,A. amieti (Praz et al., 2019). These analyses also revealed two
sympatric clades for A. bicolor (also reported in Schmidt et al.,
2015) and two sympatric clades for A. amieti . Genetic distances
between these sympatric clades were comparatively high, with
approximately 3.7% and 2.4% for A. bicolor and A.
amieti , respectively, and hence comparable to distances among valid
species in this group. In addition, the two sympatric clades withinA. amieti formed a paraphyletic unit from which the distinct
species A. allosa arose. Whether the mitochondrial clades found
in A. bicolor or A. amieti represent additional cryptic
species within this group, remains unclear.
To complement the dataset, two specimens collected in Greece and
probably representing an undescribed species were included in this
species complex; this species is referred to as Andrena sp3, as
in Praz et al. (2019).
2.1.2. Species complex 2: Andrena
barbareae/cineraria
A. barbareae and A. cineraria are sibling species,
morphologically very close, although identifiable in most cases by a
combination of characters in both genders (Amiet et al. 2010). A.
cineraria has a wider distribution than A. barbareae , which is
mainly restricted to the Alps. Both species are polylectic but exhibit
different phenologies, with two generations for A. barbareae and
one for A. cineraria . Because of their morphological,
biogeographical and phenological differences, both species are generally
considered as separated although this view was recently challenged
because both taxa share identical barcodes (Schmidt et al., 2015).
2.1.3. Species complex 3: Andrena
carantonica/trimmerana/rosae
Taxonomical delimitation in this species complex is a long-standing
enigma with controversial species delimitation hypotheses due to
morphologically divergent generations and unclear morphological
differentiation. First, Andrena rosae and A. stragulataare considered by most authors to represent the summer and spring
generations of the same species (Falk, 2016; Reemer, Groenenberg, Van
Achterberg, & Peeters, 2008; Westrich, 2014). Nevertheless, both taxa
exhibit differences in terms of morphology, pollen collecting behaviour
and, possibly, nesting sites (Amiet et al., 2010; van der Meer, Reemer,
Peeters, & Neve, 2006; Westrich, 2014). Second, morphological
differentiation of A. carantonica and A. trimmerana is
challenging. No clear morphological character allows to separate
females; and while males of the first generation of A. trimmeranadiffer in the morphology of the mandible, those of the summer generation
cannot be differentiated from those of A. carantonica . Both taxa
exhibit distinct phenologies with only one generation for A.
carantonica (April to May) and two for A. trimmerana(March-April, June-July), although isolated late-summer specimens ofA. carantonica are known (C. Praz and R. Paxton, unpublished
data). Both taxa overlap in their distribution area, but A.
carantonica is much more abundant than A. trimmerana, for which
only a few occurrences were reported in Switzerland or Germany.
2.1.4. Species complex 4: Andrena
dorsata/propinqua
Depending on the author, A. dorsata and A. propinqua are
considered separate or conspecific taxa (Amiet et al., 2010;
Gusenleitner & Schwarz, 2002; Schmid-Egger & Scheuchl, 1997).
Morphologically, the separation of both taxa is complicated and subject
to errors, especially at the European scale. At the genetic level, two
distinct clades mostly corresponding to the two taxa were recovered,
however, both species were previously found to share COI, suggesting
that identification errors could underlie the observed barcode sharing
(Schmidt et al., 2015).
2.1.5. Species complex 5: Lasioglossum
alpigenum/bavaricum/cupromicans
Species delimitation in this complex is generally accepted based on
clear differences in male genital morphology (Amiet, Herrmann, Müller,
& Neumeyer, 2001; Edmer, 1970). Identification of females is however
challenging, and L. bavaricum and L. cupromicans were
recently suggested to share the same COI barcode. However, since no male
of L. bavaricum had been sequenced, identification of the
sequenced specimens for this taxon remain tentative.
2.1.6. Species complex 6: Nomada
goodeniana/succincta
The morphological separation between these two species is mainly relying
on colour patterns and is therefore prone to identification errors,
although both appear to differ in their hosts, phenology and possibly in
the chemical composition of mandibular glandular secretions (Kuhlmann,
1997). Schmidt et al. (2015) found two divergent clusters for N.
succincta : a northern European cluster containing specimens of N.
goodeniana and a southern European cluster. A similar result was found
in England (Creedy et al., 2019). As described above for A.
dorsata/propinqua , the reported COI barcode sharing between N.
goodeniana and N. succincta could potentially be due to
misidentification, at least in Germany, where a previous study suggested
that both species did not share COI barcode (Diestelhorst & Lunau,
2008).
2.2. Sampling
Most specimens were sampled across Switzerland in the frame of the “Red
List of Swiss bee” project between 2008 and 2019. Additional specimens
were collected in France, Italy and Greece. For the A. bicolorcomplex, sites known to harbour large populations or several
species/clades in sympatry were additionally sampled in 2018. Bees
collected within the red list surveys were killed in ethyl acetate,
pinned and preserved dry. Samples collected in 2018 were preserved in
70% EtOH at 4°C to ensure good DNA preservation. All bees were
morphologically identified by one of us (C. Praz); for A.
carantonica/trimmerana , phenology was used in addition to morphology
for identification. Further information on the sampling, as well as
metadata and sampling maps are provided as Supplementary information
(S1-S2); in addition, the COI sequence of every specimen used in this
study has been uploaded onto the BOLD platform.
2.3. COI sequencing and
phylogeny
The cytochrome oxidase unit I (COI) barcode fragment of all specimens
was sequenced either by Sanger or NGS-barcode sequencing using the
primer pairs LepF/LepR (Hebert, Penton, Burns, Janzen, & Hallwachs,
2004) or mlCOIntF/HCO (Leray et al., 2013) (following protocols in
Gueuning et al., 2019). Raw sequences were imported into Geneious
v11.0.5 and consensus sequences between forward and reverse sequences
were constructed for each specimen using the Geneious assembler.
Consensus sequences were aligned per species complex using MAFFT v7.308
(Katoh & Standley, 2013). The absence of stop codons within sequences
was confirmed by translating and skimming the consensus sequences.
Phylogenetic trees were built with RAxML v8.2.11 (Stamatakis, 2014)
using the GTR GAMMA model and 100 bootstrap replicates. Since sequences
were obtained using different methods (i.e. Sanger, NGS) and different
primers, sequences were truncated to the same length (i.e. 265 bp).
2.4. UCE library preparation
Whole body DNA extractions were performed overnight in a proteinase K
buffer at 56°C and purified using a Qiagen Biosprint 96 extraction robot
following the manufacturer’s protocol. Extracts were quantified using
Qubit v4 (Thermofisher Scientific) and 50 ng DNA per specimen were
sonicated to 500 bp fragment length using a Bioruptor ultrasonicator
(Diagenode). Two independent dual-indexed libraries containing each 96
specimens were constructed using a Kapa Hyper prep kit (Roche) using one
fourth of the manufacturer’s recommended volumes (as described in
Branstetter, Longino, Ward, & Faircloth, 2017). PCR amplifications were
performed in the recommended volumes. PCR products were quantified using
a Qubit v2 and each row of a 96-well PCR plate were pooled equimolarly
(i.e. for total of 8 pools). Libraries were UCE enriched using the
Hymenopteran v2 hybridization kit (UCE Hymenoptera 2.5Kv2
Principal/Full, myBaits, Arborbiosci). Each enrichment was performed on
a single pool of 12 individuals using 500 ng. The enrichment protocol
followed the manufacturer’s recommendations with a hybridization step of
24 h at 65°C, followed by a PCR amplification with 14 cycles. Pools were
sequenced on a Miseq using five Illumina v3 kits (2 x 300 bp; Illumina,
location, Switzerland).
2.3. Bioinformatic processing of UCE
data
Fastq reads were demultiplexed on the Miseq and data from all runs were
merged and processed mainly using PHYLUCE tools (Faircloth, 2016). Raw
data were cleaned with illumprocessor (Faircloth, 2016), a tool wrapped
around trimmomatic (Bolger, Lohse, & Usadel, 2014). Clean reads were
assembled with SPAdes v3.12.0 (Nurk et al., 2013) using the single-cell
flag (“–sc”), carful option (“–careful”) and a coverage cutoff
value of five (“–cov-cutoff”). Obtained contigs were mapped against
the corresponding UCE reference file using Lastz (Harris, 2007) and
matching reads were extracted and aligned by species complex using MAFFT
(Katoh & Standley, 2013). Since species within each complex are
relatively close related (< 30-50 MYA), alignments were
edge-trimmed and not internal-trimmed (Faircloth, 2016). Loci shared by
less than 75% of the maximum number of specimens sharing a locus were
filtered out. Remaining alignments were concatenated and saved in fasta
format. An additional filtering step was applied to remove specimens
with more than 90% missing data.
2.4. UCE analyses
The remaining concatenated alignments of the UCE amplicons were used for
phylogenetic analyses. Maximum likelihood trees were produced for each
species complex with RAxML v8.2.11 using the same parameters as for the
COI RAxML trees. For comprehensiveness, two separate trees were produced
for the “bicolor -goup” (i.e. one for A. amieti and one
for A. bicolor ). Genetic distances between species/lineages for
each species complex were computed with MEGA-x (Kumar, Stecher, Li,
Knyaz, & Tamura, 2018) using the Tajima-Nei model.
Multivariate analyses and genetic distance tests were conducted in R,
mainly using the adegenet package (Jombart, 2008). Sequences were first
imported into R using the “fasta2genlight ” function which reads
aligned sequences and extracts binary SNPs before converting files into
a genlight object. After conversion, the SNPs matrices were filtered to
retain only variable sites containing less than 15% missing data.
Datasets were then screened for significant departure from
Hardy-Weinberg equilibrium using the dartR package (Gruber, Unmack,
Berry, & Georges, 2018). Principal component analyses were performed
using the “dudi.pca ” function (ade4 package; Dray & Dufour,
2007) without scaling or centering the data. PCA results were plotted
with ggplot2 (Wilkinson, 2011) using the two first components. To
further identify and describe genetic clusters, discriminant analyses of
principal components (DAPC) were performed. A first approach was used to
verify the group’s membership using a prior knowledge on the species
assignments. For the “bicolor -group”, A. bicolor andA. amieti were divided into two distinct groups (e.g.
mitochondrial lineage 1 and 2; hereafter referred to as ML1 and ML2).
The optimal number of PCs to retain was identified using both the
plotted cumulative variance of the eigenvalues and a cross-validation
method implemented in the “optim.a.score ” function. Results of
posterior membership probabilities for each specimen were plotted using
ggplot2. In a second approach, we ran a DAPC by grouping specimens into
genetic clusters without species a priori knowledge. The function
“find.clusters ” was used to determine the optimal number of
genetic clusters which was defined as the solution harbouring the lowest
Bayesian Information Criterion (BIC) value. Further, we computed for
each species complex pairwise fixation indexes (Fst) between putative
species (with A. amieti and A. bicolor treated as two
lineages) using the dartR package with 10,000 permutations. Levels of
observed genetic differences were tested using analyses of molecular
variance (AMOVAs), using the dartR package with 10,000 permutations.
Because a potential pattern of isolation by distance (IBD) was observed
in the phytogenic tree of A. amieti , correlation between genetic
and geographical distance between sampling locations was computed using
the “gl.ibd” function (dartR package) with 10,000 permutations.
Genetic distances were computed upon Nei’s genetic distance (Nei, 1972)
using the “stamppNeisD ” function from the StAMPP package
(Pembleton, Cogan, & Forster, 2013) and geographical distances
according to the “haversine method” using the geosphere package
(Hijmans, Williams, & Vennes, 2019).
Finally, three independent analyses were performed for testing species
delimitation. (I) First, we performed a Generalized Mixed Yule
Coalescent model (GMYC) on ultrametric trees. Trees for each species
complex were built with BEAST2 v2.5.2 (Bouckaert et al., 2014) using the
JC69 substitution model and a strict molecular clock with a fixed rate
of 1.0. Priors followed a yule model with a uniform distribution for
“birthRate”. MCMC ran for 250 million generations with sampling every
1000 generations. Chain convergence was assessed using the software
TRACER v1.6 (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). For
computational purposes, trees were resampled to a total of 100 trees
using the logCombiner software. The ultrametric trees were then imported
into R using the ape package (Paradis & Schliep, 2019). GMYC was
performed on the last tree using the splits package (Ezard,
Fujisawa, & Barraclough, 2009). Interval of species number was set
between 0 and 10 and the analysis was run using the single-threshold
version. (II) Second, results from the first analyses were
cross-validated using a Bayesian implementation of the GMYC model
(bGMYC) (Reid & Carstens, 2012). For each species group, the maximum
number of possible “species” was set as twice the number of expected
species present in the species complex. The MCMC was set to 11,000
generations (1000 generations burnin) and sampled every 10 generations
(“thinning”). (III) Third, we performed an analysis on the
concatenated sequences using the Bayesian Phylogenetics and
phylogeography model (BPP) (Yang & Rannala, 2010). The BPP analyses
were ran using the A11 model (i.e. unguided species delimitation
analysis; Yang & Rannala, 2014) on the nexus files (each corresponding
to one UCE) obtained after the 75% threshold filtering step. The
population file was designed so that specimens were assigned to their
species. For the “bicolor -group”, based on the phylogeny trees,A. bicolor was divided into two distinct lineages and A.
amieti in one. Alpha and beta parameters of the inverted gamma
distribution of the theta prior (average proportion of different sites
between two sequences) were set to 3 and 0.004, respectively. For the
tau prior, alpha and beta were respectively set to 3 and 0.002. The
analyses was run two times with a MCMC of 50,000 generations and with a
10% burnin period.