SNP filtering
After receiving the SNP data from DArT Pty Ltd, the SNP data and
associated metadata were read into a genlight object as implemented in R
package adegenet (Jombart, 2008) to facilitate subsequent processing
with R package dartR (Gruber et al., 2018). We created two different
datasets based upon different filtering of the initial 19903 polymorphic
SNP loci, one for the phylogenetic analysis (‘phylo’ dataset), and the
other for the PCoA and fixed difference analyses (‘PCoA’ dataset). The
phylo dataset was initially filtered to remove any
obviously-introgressed individuals within the MDB (identified using the
PCoA dataset), as reticulation events are not compatible with
bifurcating trees. The next step retained only loci for which
repeatability was greater than 0.99 and all loci with a callrate above
0.6. The PCoA dataset included all individuals and was first filtered
for repeatability to include values greater than 0.99. The second
filtering step removed all secondary loci (loci found within the same
sequenced fragment) with the locus retained having the higher
polymorphism information content (PIC) value. Finally, loci with a
callrate above 0.9 were retained. The additional filtering steps were
undertaken on the PCoA dataset for the two analyses that are sensitive
to the presence of too many missing values and/or tightly-linked loci
(ordination and the calculation of fixed differences). The data
remaining after these primary filtering steps for both datasets are
regarded as highly reliable. The PCoA dataset was used for each of the
additional (stepwise) PCoA analyses based on a subset of individuals
being compared, with additional filtering applied to remove any loci
that become monomorphic in such subsets.