Identification of RAD loci and variant filtering
We demultiplexed the data to sublibraries (index barcodes) with
BamIndexDecoder v.1.03 (included in Picard Illumina2Bam, available fromhttp://gq1.github.io/illumina2bam/),
quality-filtered and further demultiplexed the reads to individual
accessions (inline barcodes) using process_radtags.pl from STACKS
v.1.47 (Catchen et al. 2013). We removed low-quality reads with an
uncalled base and corrected inline barcodes and RAD-tags with one
mismatch, retaining only full-length reads (94 bp). We concatenated
samples which had been sequenced twice. As there is no reference genome
available for Merianieae, we followed Brandrud et al. (2020) to first
create a pseudo-reference using denovo_map.pl from STACKS and followed
up with a mapping approach to improve coverage and the recovery of loci.
For each of the six species, we selected the ten samples with the
highest number of reads to be used for building a unique
pseudo-reference. We built several catalogues using different settings
and chose the best following the r80 rule of Paris et al. (2017) for
parameter optimization. We tested m (minimum number of identical
reads required to create a stack) four and five, subsequently increasedM (number of mismatches between stacks within individual)
starting from one, and n (number of mismatches allowed between
stacks between individuals) as n = M or n =M+1 (Paris et al. 2017). For each setting, we recorded the number
of tags retained with data for at least 80% of individuals and chose
the settings m 4, M 1 and n 2 which gave the
maximum number of reliable polymorphic loci. We extracted the .fasta
pseudo-reference from the optimized catalogue by including polymorphic
RAD loci that were present in at least 30% of samples and contained up
to nine SNPs using export_sql.pl from STACKS. We then mapped the raw
reads of each accession to this pseudo-reference separately using the
mem algorithm of BWA v.0.7.12-5 (Li & Durbin 2009), flagging shorter
split hits as secondary (–M). Next, we sorted the resulting aligned
.sam-files by reference coordinates and added read groups in the output
.bam-files with Picard Tools v.2.18.17 (Wysoker et al. 2013). Finally,
we performed a realignment around indels using the Genome Analysis
Toolkit v.3.8.1 (GATK; McKenna et al. 2010).
The six species compared in this study are not equally closely related.
Specifically, Adelobotrys adscendens , the lowland bee-pollinated
species, is part of a clade that diverged from core Merianieae (where
the other five species belong to) approximately 25 million years ago
(Dellinger et al. 2021). Whereas the common pseudo-reference we used for
mapping should be appropriate for the five core Merianieae species,
ascertainment bias may affect the inference for Ad. adscendensbased on this reference. To address this issue, we tested whether the
mapping rates for the six study species to the pseudo-reference differ
significantly from each other using ANOVA and Tukey-tests as post-hoc
tests. Indeed, the mapping rate of Ad. adscendens was
significantly lower than of all other species (F = 86.31, df = 5, p
< 0.01; for pairwise-comparisons see Table S3). We hence
decided to create a separate pseudo-reference for Ad. adscendens ,
based on its own accessions only, following the optimization procedure
described above (final settings used are m 5, M 6, n 6). Using this
approach, we increased the mapping rate for Ad. adscendens from
31.7% to 35.1% (Table S4; the reference includes
only polymorphic loci). We treated Ad. adscendens the same way as
all other species in subsequent analyses unless otherwise stated.