2.5 Genetic variation in S-locus genes and their paralogs
To quantify genetic variation in the five S-locus genes
(CCMT , CYPT ,GLOT , KFBT ,PUMT ) and their four paralogs (CCM1 ,CYP734A51 , GLO1 , KFB1 ), we calculated nucleotide
diversity (π) at
synonymous
(πS)
and non-synonymous
(πN)
sites. For this analysis, we mapped the WGR reads of P. vulgaristo the chromosome-scale reference genome of P. veris (Potenteet al., 2022) because it provides a better annotation of all
S-locus genes and their paralogs outside the S-locus than the P.
vulgaris reference genome. Prior to mapping, we annotated sites at 4-
and 0-fold degenerate sites as synonymous and non-synonymous sites,
respectively, for all nine genes using the script NewAnnotateRef.py
(https://github.com/fabbyrob/science/tree/master/pileup_analyzers[last accessed on March 25, 2022]) (Williamson et al. , 2014).
Mapping of sequencing reads, variant calling, and filtering were
performed with BWA-mem v7.17 (Li & Durbin, 2009) and GATK v4.1.2.0
(McKenna et al., 2010) as described in section 2.3. Subsequently,
we split the VCF files of each of the nine genes into synonymous and
non-synonymous sites using BEDtools v2.29.2 (Quilan & Hall, 2010).
Using these VCF files, we estimated
πSand
πNwith pixy v.1.0.0 (Korunes & Samuk, 2021). Specifically, we estimated
πSand
πNof S-locus paralogs in heterostyles (i.e., pins and thrums combined) and
homostyles, while the estimates of
πSand
πNfor S-locus genes included thrums (since pins lack the S-locus) and
homostyles. Finally, we estimated the strength of purifying selection in
all S-locus genes and their paralogs by calculating the ratio of
nucleotide diversity at non-synonymous vs. synonymous sites
(πN/πS).
2.6 S- locus genotypes in natural populations of P.
vulgaris
To determine whether homostyles and thrums have either a haploid (i.e.,
S*/0 or S/0, respectively) or diploid (i.e., S*/S* or S/S, respectively)
S-locus, we calculated the relative sequencing depth of the S-locus
(RelS-locus depth ), as follows. We first
estimated the average site depth of the S-locus coding regions
(depthS-locus) and of the genome-wide coding regions
(depthGenome-wide) from filtered VCF files using
BCFtools v1.9, as indicated in section 2.5, then we calculatedRelS-locus depth as
depthS-locus/depthGenome-wide. This
normalization allowed us to account for differences of sequencing depth
among individuals. A RelS-locus depth value of
approximately 0.5 ± 0.25 indicates a haploid S-locus (S*/0 or S/0 for
homostyles and thrums, respectively), while a value close to 1 ± 0.25
indicates a diploid S-locus (S*/S* or S/S for homostyles and thrums,
respectively).
To determine whether the proportions of individuals carrying 0/0-, S/0-,
S*/S*-, and S*/0-genotypes in natural populations support the model with
lower or the one with equal viability of S*/S*-homostyles (Figure 1C and
D), we compared the observed frequencies of the four genotypes in EN4-T,
EN5-T, and EN6-M with the expected genotype frequencies predicted by
Crosby’s model (1949; Figure 1C and D). First, using Crosby’s equations
(1949), we calculated the expected frequencies of all four genotypes
(0/0-, S/0-, S*/S*-, and S*/0) at generations 10, 20, 30, and 40 after
the onset of homostyly, specifying either lower (v = 0.65) or equal (v =
1) viability of S*/S*-homostyles. Since natural frequencies of the four
genotypes might be compatible with levels of viability not included in
Crosby’s (1949) model, we additionally estimated expected genotype
frequencies under v = 0.9, 0.8, 0.7, 0.6, and 0.5 . Viability values
below 0.5 were not used because phenotypic frequencies reflecting these
conditions (roughly equal frequencies of pins and homostyles and absence
of thrums) have never been reported in natural populations.
Secondly, we estimated the observed frequencies of all four genotypes as
follows. We tallied the raw counts of pins (0/0-genotypes) and thrums
(S/0-genotypes) in EN4-T and EN5-T based on previous population surveys
(T04 and T07, respectively, in Mora-Carrera et al., 2021;
Material and Methods section 2.2, here). Furthermore, we calculated the
number of homostyles with S*/0- and S*/S*-genotypes in EN4-T, EN5-T, and
EN6-M by multiplying
the proportion of S*/0- and S*/S*-genotypes reported here (see Results
section 3.3) by the raw number of homostyles in each of the three
populations. Finally, to determine if observed and expected genotype
frequencies are compatible with Crosby’s model under lower or equal
viability of S*/S*-homostyles at 10, 20, 30, and 40 generations, we used
chi-squared tests with Bonferroni corrections. A significant difference
between observed and expected frequencies indicates that observed
genotype frequencies are not compatible with Crosby’s model, while
non-significant results indicate that they are compatible with it.