Results & Discussion

Our design approach resulted in a final bait set including 9,424 x 200bp sequences as well as 53 full genes, 49 of them immuno-related and annotated as part of a major histocompatibility complex (MHC, Marra et al. 2019), two possibly involved in reproduction (VAMP4 andTCTEX1D2 ) and two in brain development (YWHAE andARL6IP5 ) (Swift et al. 2016). The cumulative length of the bait set was 2,629,938 bp.
The baits were then tested on seven shark species (white shark,Carcharodon carcharias , bull shark, Carcharhinus leucas , tope Galeorhinus galeus , basking shark, Cetorhinus maximus , porbeagle, Lamna nasus , shortfin mako, Isurus oxyrinchus , and spurdog, Squalus acanthias ). All 36 samples produced sequence data, however, one white shark sample (“elasm-11022”) produced less than 100,000 reads (Supplementary File 1 and 3) and was therefore excluded from further analysis. This was one of the older, but not the oldest, sample included, however, the tissue was from a dried specimen, lacquered with an unknown preservative and afterwards exposed to direct sunlight for many years.
For analysis of bait efficiency and SNP calling, the target sequences were reduced to single-copy genomic loci, amounting to 9128 bait sequences. Coverage of the target region showed varying results across the species analyzed (Figure 2). In the Lamniformes,C. carcharias , L. nasus, I. oxyrinchus and C. maximus more than two thirds of the target regions were recovered at 100% (5x coverage, averaged across samples, Table 2). These four species showed the highest number of covered target regions compared to the rest of the species, which showed a recovery rate of only ~ 1/3 of the target regions (again 5x coverage, averaged across samples). Sequence divergence between target genes in these species, or their present/absence, has likely caused the varying amounts of capture success. However, proportion of on-target read bases varied only relatively slightly between clades (61.37% - 65.97% vs. 51.57% & 52.15% – 54.75%, Table 2), indicating that though the individual bait sequences show variation in their efficiency, overall the method works successfully for all species included.
Indeed, SNP calling resulted in several tens of thousands of biallelic markers per species (Table 3, whole genome lenient). The basking shark reached the highest number with over 130,000 biallelic SNPs, whereas the lowest number of SNPs < 36,000 recovered was in the tope. The white shark ranged at the lower end of numbers of biallelic SNPs, with 40,243 SNPs detected. For all species, filtering SNPs for missing genotype data as well as minimum read coverage reduced the SNPs numbers to less than half their initial numbers. The white shark had the lowest number of SNPs surviving this filtering process, with < 9,000 SNPs. The low SNP numbers in the white shark hint to a generally low genetic diversity in this species, supporting recent findings (Stanhope et al. 2023).
Excluding samples with low read numbers (see Table 3 for details) in most cases resulted in lower biallelic SNP numbers for both the basking shark and the white shark, however, excluding the failed white shark sample “elasm-11022” resulted in an increase of SNP numbers for the white shark after subsequent filtering steps, likely due to low read coverage of detected SNPs in this sample.
In general, our SNP numbers are comparable, if not higher than in other target gene capture approaches from our group (Choquet et al. 2019; Choo et al. 2020, but see Choquet et al. 2023). Correlating the SNP numbers with the genomic distance to the white shark reference genome, we see that, if we exclude the white shark, SNP numbers are constant in a range of 0.01 to 0.045, and drop slightly thereafter (Figure 3). However, relative reduction in SNP numbers between species and between higher level taxa were relatively low when compared to other target gene capture experiments (Choo et al. 2020). This could be due to the fact that the genomic distance of shark species included, even if only roughly estimated here, is low. The three non-Lamniformes species show a genetic distance of 5 – 6% from the white shark. Though this estimate has to be accepted cautiously, as our approach favours genomic regions of higher similarity to the reference genome, our finding shows that genomic distance between the shark species investigated here is extremely low compared to for example congeneric planktonic species (Choo et al. 2020; Choquet et al. 2023).
We conclude that the marker set presented here can be used for all species included, for studies of evolution, ecology, and conservation. The markers’ functionality across orders, together with the low molecular evolutionary rates characteristic of sharks (Hara et al. 2018), suggest that they could also be used for a larger range of shark species. Given the scarce genomic resources for the Selachii, it is anticipated that such a “universal” shark genotyping method will be extremely useful in future research efforts.