Genotyping SNPs
To generate the SNP dataset, we followed the PHYLUCE v.1.5.0 pipeline
with default parameters (Faircloth, 2016) for sequence trimming,de novo assembly of contigs, identification of UCE loci, and
sequence alignment. We generated a pseudo-genomic reference by aligning
each locus with MAFFT v7.407 and trimming using Gblocks v0.91b with
default parameters (Castresana, 2000). We then used Geneious v9.1.2 to
generate a consensus sequence for each locus, replacing ambiguity codes
with an appropriate nucleotide at random. We used Picard v1.106
(http://broadinstitute.github.io/picard/), and SAMtools v1.9 (Li
et al., 2009) to create sequence dictionaries and reference indices from
the reference. We used the PHYLUCE script snps.py to automate
alignment of trimmed reads from each sample to the reference with
BWA-MEM v0.7.17, and then called SNPs with the HaplotypeCaller tool of
the Genome Analysis Toolkit v3.7 (McKenna et al., 2010) following Giarla
and Esselstyn (2015). Using VCFtools v0.1.14 (Danecek et al., 2011), we
removed SNPs that failed to pass GATK quality filters (QD <
2.0 || FS > 60.0 || MQ
< 40.0 || HaplotypeScore > 13.0
|| MappingQualityRankSum < -12.5
|| ReadPosRankSum < -8.0), and selected SNPs
with a minimum depth of coverage of 8 per individual and a minor allele
frequency ≥ 5%. We used HD_plot.py (McKinney, Waples, Seeb, &
Seeb, 2017) to filter SNPs resulting from putative paralogs or wrongly
assembled contigs from the dataset by removing SNPs with heterozygosity
> 0.75 and a read-ratio deviation score D >10.
The D statistic is a measure of deviation from the expected allelic read
ratio of 1:1 when reads are summed over all heterozygous individuals.
This method more accurately identifies true SNP loci than methods
relying on read depth or heterozygote excess alone (McKinney et al.,
2017). After filtering with HD_plot.py , we removed SNPs that
were out of Hardy-Weinberg Equilibrium after Bonferroni correction for
multiple comparisons (p < 10-5) using
VCFtools v0.1.16. To generate a set of unlinked SNPs, we selected
a single SNP per UCE using VCFtools v0.1.16 ‘-thin 2000’. We used the
unlinked SNP dataset with 10% missing data for all SNP-based analyses
except calculation of genetic diversity and effective population size,
Principal Components Analysis (PCA), Discriminant Analysis of Principal
Components (DAPC) and Bayesian cluster analysis with STRUCTURE v2.3.4
(Pritchard, Stephens, & Donnelly, 2000), for which we used a dataset
with no missing data.