Generating and assembling genomic data
To obtain a reference genome sequence, one P. serrata specimen
(98c_Pro_SE) was sequenced by Oxford Nanopore technology (ONT) on one
MinION II flowcell, producing 16.7 Gbp of data (3.3 million reads of
average size of 5 kbp). Library preparation and sequencing was performed
at Roy J. Carver Biotechnology Center (University of Illinois at
Urbana-Champaign, USA). The ONT reads were trimmed with filtlong
(v0.2.0; https://github.com/rrwick/Filtlong) to retain sequences of at
least 4000 bp and phred score of at least 20. Flye assembler (v2.5;
Kolmogorov, Yuan, Lin, & Pevzner, 2019) was used to perform de novo
genome assembly using ONT reads (with/without quality trimming) with the
expected genome size of 110 Mbp as a parameter. The resulted assemblies
were polished once with Racon (v1.3.3; https://github.com/isovic/racon)
and twice with Medaka (v0.6.5; https://nanoporetech.github.io/medaka/)
using the ONT reads. The final polishing was performed again with Racon,
this time using illumina reads. Quality checks of the final polished
assemblies were performed with BUSCO (v3; Waterhouse et al., 2017).
Biological origin of the assembled contigs was verified by blastx
(Altschul et al., 1997) of the polished assembly (98 contigs) against
the Pediculus humanus corporis genome (GCA_000006295.1
JCVI_LOUSE_1.0). The contigs that did not yield a hit (removed were 60
contigs, 0,46% of the assembly length) were excluded from the following
analyses after their non-louse origin was verified by blastx against
nr/nt NCBI database.
Whole genome re-sequencing was performed to generate metagenomic data
used to map the SNPs, and to assemble genomes of the L.
polyplacis symbionts and mitochondrial minichromosomes. gDNA libraries
for thirteen louse specimen were constructed for paired-end Illumina
sequencing with NovaSeq6000 instrument. All samples were sequenced on
one Illumina Novaseq lane producing on average 62.5 million 150 bp
paired-end reads (PE) per sample. Libraries were constructed with an
average insert size of 450bp. Fastq files were generated from the
sequence data using Casava v.1.8.2 or bcltofastq v.1.8.4 with Illumina
1.9 quality score encoding. All sequencing and fastq file generation was
carried out at the W. M. Keck Center (University of Illinois, Urbana,
IL, USA). Illumina reads of the re-sequenced specimens were checked for
quality and filtered in BBtools
(https://jgi.doe.gov/data-andtools/bbtools/), then assembled using the
SPAdes assembler v 3.10 (Bankevich et al., 2012), under default settings
with the parameter careful, decreasing number of mismatches and indels.
To analyze genetic structure in the area of secondary contact of theApodemus -Polyplax populations (and the presumed HZ of the
parasite), we compared patterns obtained from three different sources:
nuclear SNPs, mitochondrial genomes, and genomes of symbiotic bacteriumL. polyplacis . Our hypothesis (visualized in the Figure1 c, d)
predicts that for both maternally inherited markers (mitochondrial and
symbiotic genomes), the inter-lineage divergence (i.e. SE vs. SW) will
be much higher than any divergence within the lineages. This reflects
the hypothetical scenario of two lineages long isolated during they stay
in two different refugia and also during most of the recolonization
journey. On the contrary, since the hypothesis presumes rare
hybridization, no such clear distinction should be found in nuclear
SNPs. To visualize and quantify diversification of maternally inherited
markers, we reconstructed genealogical trees and calculated distance
matrices for mitochondrial and symbiotic genomes. To visualize nuclear
diversification using SNPs, we analyzed population structure using PCA
and reconstructed genealogical relationships by building a maximum
likelihood tree. Below, we provide details on processing the three
sources of the data.