Genome-wide SNP genotyping
We used the same samples and quality-filtered reads included in our
previous study on the genetic architecture of limb length in A.
sagrei (Bock et al., 2021). This sampling includes the A. sagreimales used here for dewlap measurements, non-native population samples
obtained earlier in the invasion (i.e., in 2003), and samples from the
native range of A. sagrei . Using quality-filtered reads for all
samples (see detailed methods in Bock et al., 2021), we repeated the SNP
calling and variant filtering steps based on version 2.1 of the A.
sagrei genome, which recently became available (Geneva et al., 2021).
Reads were aligned to the genome using the dDocent v2.2.20 pipeline
(Puritz et al., 2014), and SNPs were called using Freebayes v. 1.3.2
(Garrison & Marth, 2012).
Filtering of the resultant variant calls was implemented using vcflib
(https://github.com/vcflib/vcflib) and consisted of sequential steps
based on number of alleles (i.e., keeping only biallelic markers), type
of variant (i.e., keeping only SNPs), read mapping quality (i.e., using
SNPs with a MAPQ score > 20), and depth of sequencing
(i.e., keeping only genotypes with DP > 7). For the
remaining filtered SNPs, we used BCFtools v.1.9 (Narasimhan et al.,
2016) to subset genotypes corresponding to the sequenced A.
sagrei individuals obtained in 2018 from Florida and Georgia, for which
dewlap trait data was also available (N = 561). Of the 561
samples, nine were sequenced in duplicate, resulting in 570 total
sequencing libraries. We then kept only SNPs with data at more than 70%
of samples and only SNPs with a minor allele frequency >
1%, and calculated identity-by-state (IBS) between samples using the
SNPrelate R package (v. 1.19.4; Zheng et al., 2012). Following previous
studies (e.g., Bock et al., 2021), we relied on IBS values between DNA
replicates to estimate the rate of genotyping errors in our dataset. We
then kept one replicate per sample and repeated the 70% call rate and
minor allele frequency filters described above across all 561
genetically distinct samples. Finally, we removed one sample that had
missing data at more than 30% of the final filtered SNPs, keeping the
remaining 560 genotypes for downstream analyses.
Of the resulting filtered SNPs, we removed candidate gametolog SNPs
following Bock et al. (2021). These gametolog SNPs occur on the X
chromosome of A. sagrei (Geneva et al., 2021) and are a result of
homology between the X chromosome (i.e., scaffold 7 in the A.
sagrei v2.1 genome assembly) and the Y chromosome, which is currently
not included in the genome assembly. After gametolog removal, we
excluded markers that were in strong linkage disequilibrium (i.e.
r2 > 0.4), by scanning the genome in
5,000 SNP windows, using ‘–indep-pairwise’ option in PLINK (v1.9;
Purcell et al., 2007). We then kept all SNPs located on the largest 14
scaffolds of the genome assembly, which correspond to the known number
of chromosomes for A. sagrei and cover more than 99% of the
total assembly length (Geneva et al., 2021).