Identification of RAD loci and variant filtering
We demultiplexed the data to sublibraries (index barcodes) with BamIndexDecoder v.1.03 (included in Picard Illumina2Bam, available fromhttp://gq1.github.io/illumina2bam/), quality-filtered and further demultiplexed the reads to individual accessions (inline barcodes) using process_radtags.pl from STACKS v.1.47 (Catchen et al. 2013). We removed low-quality reads with an uncalled base and corrected inline barcodes and RAD-tags with one mismatch, retaining only full-length reads (94 bp). We concatenated samples which had been sequenced twice. As there is no reference genome available for Merianieae, we followed Brandrud et al. (2020) to first create a pseudo-reference using denovo_map.pl from STACKS and followed up with a mapping approach to improve coverage and the recovery of loci. For each of the six species, we selected the ten samples with the highest number of reads to be used for building a unique pseudo-reference. We built several catalogues using different settings and chose the best following the r80 rule of Paris et al. (2017) for parameter optimization. We tested m (minimum number of identical reads required to create a stack) four and five, subsequently increasedM (number of mismatches between stacks within individual) starting from one, and n (number of mismatches allowed between stacks between individuals) as n = M or n =M+1 (Paris et al. 2017). For each setting, we recorded the number of tags retained with data for at least 80% of individuals and chose the settings m 4, M 1 and n 2 which gave the maximum number of reliable polymorphic loci. We extracted the .fasta pseudo-reference from the optimized catalogue by including polymorphic RAD loci that were present in at least 30% of samples and contained up to nine SNPs using export_sql.pl from STACKS. We then mapped the raw reads of each accession to this pseudo-reference separately using the mem algorithm of BWA v.0.7.12-5 (Li & Durbin 2009), flagging shorter split hits as secondary (–M). Next, we sorted the resulting aligned .sam-files by reference coordinates and added read groups in the output .bam-files with Picard Tools v.2.18.17 (Wysoker et al. 2013). Finally, we performed a realignment around indels using the Genome Analysis Toolkit v.3.8.1 (GATK; McKenna et al. 2010).
The six species compared in this study are not equally closely related. Specifically, Adelobotrys adscendens , the lowland bee-pollinated species, is part of a clade that diverged from core Merianieae (where the other five species belong to) approximately 25 million years ago (Dellinger et al. 2021). Whereas the common pseudo-reference we used for mapping should be appropriate for the five core Merianieae species, ascertainment bias may affect the inference for Ad. adscendensbased on this reference. To address this issue, we tested whether the mapping rates for the six study species to the pseudo-reference differ significantly from each other using ANOVA and Tukey-tests as post-hoc tests. Indeed, the mapping rate of Ad. adscendens was significantly lower than of all other species (F = 86.31, df = 5, p < 0.01; for pairwise-comparisons see Table S3). We hence decided to create a separate pseudo-reference for Ad. adscendens , based on its own accessions only, following the optimization procedure described above (final settings used are m 5, M 6, n 6). Using this approach, we increased the mapping rate for Ad. adscendens from 31.7% to 35.1% (Table S4; the reference includes only polymorphic loci). We treated Ad. adscendens the same way as all other species in subsequent analyses unless otherwise stated.