DNA extraction, pooling, sequencing
For all the population of our crossing scheme, Genomic DNA was extracted
from young leaves previously sampled, snap frozen and stored at -80C.
DNA was extracted following a modified CTAB method from Maloof lab
(https://openwetware.org/wiki/Maloof_Lab:96well_CTAB). DNA
concentration and purity was estimated with a NanoDrop ND-1000
spectrophotometer (Thermo Scientific, MA, USA). The DNA integrity was
confirmed was assessed using a 1% agarose gel with ethidium bromide.
Prior to sequencing, DNA concentration was measured with a Qubit
Fluorometer (Invitrogen, MA, USA).
For the bulk segregant analysis (BSA) resistant (R, plants showing HR)
and susceptible (S, plants without HR) bulks (n = 10) were
prepared by pooling equal amounts of DNA from each individual plant.
Further, the three accessions that originated the initial crossing were
also used for DNA isolation (the S parent DG1-S1, the R parent F1_#130
and the donor of HR SF48-O1, “R donor” (Fig. 2). For the WGS
experiment, 1 ug of genomic DNA of each sample (three accessions and two
bulks) was used for library preparation. Library preparation and whole
genome sequencing were carried out by Novogene (Cambridge, UK).
Sequencing libraries were generated using the NEBNext Ultra II DNA
Library Prep Kit for Illumina (New England Biolabs, UK) following the
manufactures’ protocol. The genomic DNA was randomly fragmented to a
size of 350bp by Bioruptor, then DNA fragments were size-selected with
sample purification beads. The selected fragments were then
end-polished, A-tailed, and ligated with the full-length adapter. After
these treatments, these fragments were filtered with beads. At last, the
library was analysed for size distribution by Agilent 2100 Bioanalyzer
(Agilent technologies, CA, USA) and quantified using real-time PCR.
Libraries were sequenced on an Illumina NovaSeq 6000 platform using 150
bp paired end (PE) reads.
K-mer
based genetic mapping
K -mer based genetic mapping was performed following the
recommendations of Comparative subsequence sets analysis (CoSSA)
workflows (Prodhomme et al., 2019). First, sequencing read quality was
assessed with FastQC (Andrews, 2010). Then, k- mer tables were
built with a k -mer size of 31 nucleotides using GlistMaker of the
GenomeTester4 v4.0 suite (Kaplinski, Lepamets, & Remm, 2015).k- mer that occurred only once were removed as likely resulting
from sequencing errors. To identify resistant (R )
haplotype-specific k -mers, GlistCompare of GenomeTester4 was used
to perform basic set operations such as unions, intersections of
differences between k -mer tables of different samples (Supporting
information: Fig. S1). An additional filtering step was carried out to
retain R haplotype-specific k -mers. The sequencing
yielded 14.4 Gb for the R bulk and an approximate 24x depth
considering a haploid B. nigra genome of ~550 Mb.
Given that B. nigra genome is diploid (2n = 2x = 16) and assuming
uniform sequencing coverage, k -mers originated from the Rhaplotype should have sequencing depth of ~12x. Thus, we
decided to retain k -mers with a depth between 10x and 20x which
represented k -mers derived from R-specific haplotype. Retainedk -mers were aligned to B. nigra reference genomes NI100
v2.0, C2 v1.0 and Sangam v1.0 using BWA aln (v0.7.17) allowing 3
mismatches (Li & Durbin, 2009). The number of mapped k -mers
mapped per 1 Mb bins were counted using bedtools v2.25 (Quinlan & Hall,
2010).
KASP
markers genotyping
Kompetitive Allele Specific Polymorphisms (KASP) PCR markers were used
to validate the results of k -mer based genetic mapping. For each
sample, DNA concentration was adjusted to 5-50 ng/µl. Primers were
designed on the sequences flanking the SNPs identified by k -mers
of the R-specific haplotype. KASP genotyping assays were performed
according to the manufacturer’s instructions (LGC Genomics, UK). In
brief, 2 µL DNA at a concentration of 5–50 ng/µL was added to 96-well
plate with KASP PCR mix (5 µL 2× KASP Master mix, 0.6 µL 10mM primer
mix, 2.4 Milli-Q water). The PCR was performed in a CFX96 Touch
Real-Time PCR Detection System combined with CFX Maestro Software for
data reading (Bio-Rad, Hercules, CA, USA).
Variant
calling within PEK locus
Single nucleotide polymorphisms (SNPs) and Insertion/Deletion (InDels)
variants were called using a workflow based on GATK Best Practices ( et
al., . Reads were aligned to the B. nigra reference genome C2
v1.0, to which the mitochondrial sequence (Genbank accession no.
NC_029182) and chloroplast sequence (Genbank accession no. NC_030450)
were added, using BWA mem v0.7.17 with default parameters (Li, 2013).
The resulting alignment files were sorted and indexed using SAMtools
v.1.11 (Li et al., 2009). Alignment files were filtered to restric
variant calling to the PEK locus. Duplicate read pairs were
marked using the MarkDuplicates tool of the GATK suite v.4.1.9.0.
Variants (SNPs and InDels) were called in each sample on a window of x
Mb around the PEK locus using GATK HaplotypeCaller. Samples were
then jointly genotyped using GATK GenomicsDBImport and GATK
GenotypeGVCFs, with default parameters. SNPs filtration was performed
with the following parameters: QD < 2, QUAL < 30,
SOR > 3, FS > 60, MQ < 40, MQRankSum
< -12.5, ReadPosRankSum < -8. InDels filtration was
performed with the following parameters: QD < 2, QUAL
< 30, SOR > 3, FS > 200,
ReadPosRankSum < -20. Finally, only variants that in agreement
with our genetic model were retained: i) heterozygous in resistant (HR+)
material; ii) homozygous DG1-S1 allele in susceptible (HR-) material.
The functional effect of the retained variants was predicted using
SnpEff with default parameters (Cingolani et al., 2012).
Copy number variations (CNVs) at PEK locus between three B.
nigra reference genomes was performed with the tool GEvo on the CoGe
web platform (Lyons et al., 2008). All genomes were accessed in
September 2021 from the following databases: B. nigra genomes
NI100 v2.0 and C2 v1.0 were downloaded from
http://cruciferseq.ca/, genome Sangam v1.0 from
http://brassicadb.cn/. Pairwise alignment of genomes was performed
with default settings: Alignment algorithm “(B)LastZ: Large regions”;
Word size: 8; Gap start penalty: 400; Gap extend penalty: 30; Chaining:
NO; Score threshold: 3000; Max threshold: 0; Minimum HSP length: 50.
CNVs were identified by visualizing the region between markers flankingPEK locus (M27 and M28) and identifying conserved regions/genes
that were highlighted as High Scoring Pairs (HSP).