2.5 Phylogeographic scenarios
Reciprocal monophyly was inferred on all markers as partition by gene
trees, which were used to evaluate the degree of divergence among
lineages using MrBayes v. 3.2.6 (Ronquist et al. 2012; Supplementary
Material 4). To reconstruct the scenario of divergence among the
different populations, we used species trees, inferred using two
different methods. We first ran an analysis with *BEAST v 2.2.0
(Bouckaert et al. 2014) using the three mitochondrial markers. We choose
to link time-trees for the three mitochondrial markers, since they are
on the same plasmid, where no recombination is expected at the time
scale considered in this study. However, the three markers have
different composition and different evolutionary rates, so we did not
link the molecular clock and evolution models. We tested the hypothesis
of molecular clock with the Clock Test using ML implemented in MEGA
v7.0.20 (Kumar et al. 2016). This hypothesis was rejected (p-value
< 0.0005). We therefore used, for each marker, an uncorrelated
lognormal relaxed clock model. The clock rate was fixed to 0.00553 ±
0.00115 substitution per site per million year for cox1 and
0.00631 ± 0.0035 for cytb (rates inferred for Procellariiformes
by (Nabholz et al. 2016)). The rate for the control region was estimated
by the model as no rate is published for Procellariiformes. Existing
rates for other birds could not be used either because they belong to
groups that are too distant (e.g. Moas (Baker et al. 2005) or Peafowls
(Kimball et al. 1997)), and the control region is highly variable among
groups (Ruokonen & Kvist 2002). Published rates for the CR were however
set in *BEAST as priors and did not affect the results (data not shown).
We used for each marker a model consistent with the result of
jModelTest2, and a Yule process species tree prior with a continuous
population size model. As for the MrBayes analysis, we ran each MCMC
chain with 50*106 generations, sampled every 1,000
generations, the first 25% of generations were discarded as burn-in and
we inspected the stationarity of the chains using Tracer (Rambaut et al.
2018). To test whether the colonization of the Atlantic could result
from birds of the Pacific passing through the Panama Isthmus,
individuals from the central Pacific (Marquesas archipelago, taxondichrous ) were added to the *BEAST inference. This taxon can be
considered as the best representative taxon for the central Pacific, as
it is the most widespread and numerous, since polynesiae is
considered synonym to dichrous (Austin et al. 2004), whilebannermani is best recognised as a species on its own (Kawakami
et al. 2018). The taxon gunax (from Vanuatu) has never been
sequenced, and actually the location of its breeding colonies is
unknown. Three dichrous cytb sequences retrieved from
Genbank (AY219949-AY219951) and three individuals from our own
collection were used. We ran a second *BEAST analysis by adding all the
nuclear markers independently to the three mitochondrial markers, using
the same MCMC parameters. Clock rates priors were set to 0.019
substitutions per site per million year for βfib and 0.013
substitution per site per million year for rag1 , since these
rates were estimated for birds (Groth & Barrowclough 1999; Prychitko &
Moore 1997). The clock rate for nuclear markers was estimated by the
model as no rate has been produced for petrels. In this latter case, we
kept only the individuals for which we had sequences for all markers. To
investigate the demographic history of lineages, we estimated population
size through time by estimating Extended Bayesian Skyline plot as
implemented in *BEAST, for each genetic unit previously delimited,
considering a generation time of 15 years (Precheur et al. 2016).
We also used a coalescent-based ABC approach to explore the best
demographic scenario describing the dataset of the combined
mitochondrial and nuclear markers using the program DIYABC v. 2.1.0
(Cornuet et al. 2014). ABC methods consist in the simulation of datasets
similar to the real dataset in terms of population and marker sizes.
First, in the Indian Ocean, we tested if the three populations emerged
simultaneously in a radiation event or in two disjoint events by
comparing the posterior probability of these three scenarios, and in the
case of the latter, which population was ancestral to the two others and
which one of the two remaining populations was ancestral to the other.
This hierarchical strategy was applied to each lineage independently,
then to the three Atlantic lineages, and finally considering the five
lineages together (see Supplementary Material 5 for a description of all
tested scenarii). For each possible scenario, 106pseudo-observed datasets were simulated, with the same ploidy and number
of loci per population as observed in the real dataset. We fixed uniform
priors for population sizes, times of size variation and divergence and
mutation and admixture rates priors (see Supplementary Material 4 for
details), from which we simulated the datasets. Summary statistics were
calculated from the simulated datasets and compared to the same
statistics obtained from the real dataset. The Euclidean distance was
calculated between the statistics obtained for each normalized simulated
dataset and those for the observed dataset (Beaumont et al. 2002).
Posterior probability of each scenario was then calculated using a
logistic regression on summary statistics produced by the 1% of the
simulated datasets closest to the real dataset. To reduce the
dimensionality of the data, a linear discriminant analysis was
preliminarily applied to the summary statistics (Estoup et al. 2012).
The scenario with the highest posterior probability value with a
non-overlapping 95% confidence interval (95% CI) was selected.
Results