2.2 Sampling and sequencing of plant material
In spring 2019, we collected leaf tissue from 105 individuals ofP. vulgaris , comprising 74 heterostyles (37 pins and 37 thrums)
and 31 homostyles. The samples were obtained from nine populations,
including: six dimorphic populations (i.e., pins and thrums) from Turkey
(TR-D), Slovakia (SK-D), Switzerland (CH-D) and England (EN1-D, EN2-D,
and EN3-D); two trimorphic (i.e., pins, thrums, and homostyles)
populations from England (EN4-T and EN5-T); and one monomorphic (i.e.,
only homostyles) population from England (EN6-M) (Table 1). Populations
EN4-T, EN5-T and EN6-M corresponded to populations T04, T07, and M01,
respectively, from our previous study (Mora-Carrera et al .,
2021). Moreover, the 11 homostyles from EN6-M analyzed here included
three of the homostyles carrying CYPT -1 from
population M01 of Mora-Carrera et al . (2021; Figure 1B). DNA
extractions were performed using the Maxwell extraction method (Promega,
USA) at the Functional Genomics Center Zurich (Zurich, Switzerland).
Library preparation and paired-end sequencing (150bp) were conducted by
RAPiD GENOMICS (Gainesville, Florida, USA) using Illumina Novaseq6000
platform, generating 7,077,866,510 paired-end sequencing reads with an
average sequencing depth of 18.9 (± SD = 3.08).
2.3 Mapping and variant calling of WGR data to
CYPT from the P. vulgaris genome
To determine the sequences of all five exons and four introns ofCYPT , along with its up- and down-stream
intergenic regions, we mapped WGR reads from all 105 individuals to a
previously published genome of P. vulgaris (Cocker et al. ,
2018). We opted to use the less contiguous genome of P. vulgaris(Cocker et al., 2018) rather than the highly contiguous,
chromosome-scale genome of P. veris (Potente et al., 2022)
to enhance the likelihood of accurate mapping of WGR reads to intronic
as well as up- and downstream intergenic regions ofCYPT . Prior to mapping, we also produced a
reference S-locus assembly for P. vulgaris by replacing all
S-locus contigs (LH_v2_0002458, LH_v2_0067593, LH_v2_0003915, and
LH_v2_0000241) in the genome of P. vulgaris with a published
450 kb sequence of the P. vulgaris S-locus (Li et al .
2016). To determine the position of CYPT in the
reference S-locus assembly, we aligned the coding sequence ofCYPT against the reference using thequery function of blastn with default parameters, which is part
of the NCBI BLAST+ toolkit v2.6.0 (Camacho et al. , 2009).
Prior to mapping, Illumina adapters were clipped from raw reads with
Trimmomatic v0.38 (Bolger et al. , 2014) using default parameters.
Mapping was performed using BWA-mem v7.17 (Li & Durbin, 2009) with
default parameters. As negative control, pin individuals (0/0) were
included in the analysis and, as expected, none of the sequencing reads
from the 37 pins mapped to the S-locus. Duplicated reads were marked
with the MarkDuplicates tool included in Picard v2.18.4
(http://broadinstitute.github.io/picard/).
Variant calling of SNPs and indels for the S-locus was conducted using
HaplotypeCaller, implemented in the Genome Analysis Toolkit (GATK)
v4.1.2.0 (McKenna et al. , 2010) pipeline. Finally, SNP variants
were filtered from the Variant Call Format (VCF) file using the
SelectVariants with following filters: quality-by-depth (QD)
> 2.0; mapping quality (MQ) > 40.0; strand
bias (FS) < 60.0; mapping quality rank-sum test (MQRankSum)
> -12.5; a rank-sum test (ReadPositionRankSum)
> -8.0; site read depth (DP < ½X)
|| (DP > 3X). Additionally, sites with
fixed heterozygosity (i.e., InbreedingCoeff < -0.99) likely
representing incorrect SNP calling (O’Leary et al ., 2018; Pavanet al ., 2020) were filtered out.
2.4 Identification of mutations putatively disrupting
CYPT function in P. vulgaris
Disruptive mutations in CYPT coding
regions - To identify potential homostyle-specific, loss-of-functionCYPT mutations, including non-synonymous
mutations, insertions, and deletions, we compared the sequences of the
five CYPT exons in 31 homostyles with the
functional CYPT allele of the 37 thrums. For
this, we extracted the respective sequences of the fiveCYPT exons from the S-locus VCF file using theintersect function included in BEDtools v2.29.2 (Quinlan & Hall,
2010) and converted them into a single FASTA file using vcf2phylip.py
(https://github.com/edgardomortiz/vcf2phylip).
The sequence alignment of all exons and the detection of putatively
disruptive mutations in CYPT of homostyles and
thrums were performed with MEGA X (Kumar et al. , 2018). Finally,
we compared the resulting sequence alignment with an alignment of
previously detected mutations in CYPT exons
reported by Mora-Carrera et al . (2021).
Structural rearrangements in CYPT- To
determine whether the shift to homostyly is associated with structural
rearrangements involving CYPT exons, we
examined the paired-end sequencing reads mapped to the introns and exons
of CYPT in both thrums (as a reference) and
homostyles using the Interactive Genomic Viewer (IGV) v2.8.6 (Robinsonet al. , 2011). Translocations can be identified by analyzing the
mapped paired-end reads, where one read is mapped to one position in the
genome (e.g., CYPT in the S-locus), while its
mate-pair is mapped to a different position, either in the same or
different chromosome. Inversions can be detected by comparing the
orientation of the mapped read-pairs to CYPT in
the reference genome. If there is a small inversion, both mapped
paired-end reads should be oriented in the same direction (→→ or ←←),
whereas in the absence of a structural change normal mapped paired-end
reads should be oriented towards each other (→←). Finally, deletions are
characterized by drops in sequencing read coverage at specific positions
in the genome, in
this case, withinCYPT .
Disruptive mutations in CYPT promoter
region - To determine whether mutations in the promoter region are
responsible for CYPT loss of function in
homostyles, we conducted an analysis to identify the putativeCYPT transcription-factor binding-site and
searched it for homostyle-specific mutations. It is expected that the
3kb region upstream of a gene of interest contains its promoter,
including the transcription-factor binding-site (Yu et al. ,
2016), thus we first extracted and aligned the 3kb sequence upstream ofCYPT exon 1 from 20 thrums in dimorphic
populations TR-D, SK-D, CH-D, and EN1-D of P. vulgaris . We then
added the 3kb sequence upstream of CYPT from
the P. veris high-quality reference genome (Potente et
al., 2022) to the aligned dataset. Secondly, we inspected the aligned
upstream sequence of CYPT to look for any
single nucleotide polymorphism (SNP) that was fixed in homostyles but
absent in thrums. Thirdly, to identify enriched motifs of transcription
binding-sites in the aligned dataset, we employed the Simple Enrichment
Analysis (SEA) implemented in the Multiple Em for Motif Elicitation
(MEME) suite program v5.5.0 (McLeay & Bailey, 2010). We used theArabidopsis thaliana DAP motifs database (O’Malley et al. ,
2016) and default parameters. The identified enriched motif sequences
were compared between the 37 thrums and 31 homostyles by aligning these
regions with MEGA X (Kumar et al. , 2018). To determine the
potential function of the identified enriched motifs, we consulted TheArabidopsis Information Resource (TAIR)
(http://www.arabidopsis.org).