Generating and assembling genomic data
To obtain a reference genome sequence, one P. serrata specimen (98c_Pro_SE) was sequenced by Oxford Nanopore technology (ONT) on one MinION II flowcell, producing 16.7 Gbp of data (3.3 million reads of average size of 5 kbp). Library preparation and sequencing was performed at Roy J. Carver Biotechnology Center (University of Illinois at Urbana-Champaign, USA). The ONT reads were trimmed with filtlong (v0.2.0; https://github.com/rrwick/Filtlong) to retain sequences of at least 4000 bp and phred score of at least 20. Flye assembler (v2.5; Kolmogorov, Yuan, Lin, & Pevzner, 2019) was used to perform de novo genome assembly using ONT reads (with/without quality trimming) with the expected genome size of 110 Mbp as a parameter. The resulted assemblies were polished once with Racon (v1.3.3; https://github.com/isovic/racon) and twice with Medaka (v0.6.5; https://nanoporetech.github.io/medaka/) using the ONT reads. The final polishing was performed again with Racon, this time using illumina reads. Quality checks of the final polished assemblies were performed with BUSCO (v3; Waterhouse et al., 2017). Biological origin of the assembled contigs was verified by blastx (Altschul et al., 1997) of the polished assembly (98 contigs) against the Pediculus humanus corporis genome (GCA_000006295.1 JCVI_LOUSE_1.0). The contigs that did not yield a hit (removed were 60 contigs, 0,46% of the assembly length) were excluded from the following analyses after their non-louse origin was verified by blastx against nr/nt NCBI database.
Whole genome re-sequencing was performed to generate metagenomic data used to map the SNPs, and to assemble genomes of the L. polyplacis symbionts and mitochondrial minichromosomes. gDNA libraries for thirteen louse specimen were constructed for paired-end Illumina sequencing with NovaSeq6000 instrument. All samples were sequenced on one Illumina Novaseq lane producing on average 62.5 million 150 bp paired-end reads (PE) per sample. Libraries were constructed with an average insert size of 450bp. Fastq files were generated from the sequence data using Casava v.1.8.2 or bcltofastq v.1.8.4 with Illumina 1.9 quality score encoding. All sequencing and fastq file generation was carried out at the W. M. Keck Center (University of Illinois, Urbana, IL, USA). Illumina reads of the re-sequenced specimens were checked for quality and filtered in BBtools (https://jgi.doe.gov/data-andtools/bbtools/), then assembled using the SPAdes assembler v 3.10 (Bankevich et al., 2012), under default settings with the parameter careful, decreasing number of mismatches and indels.
To analyze genetic structure in the area of secondary contact of theApodemus -Polyplax populations (and the presumed HZ of the parasite), we compared patterns obtained from three different sources: nuclear SNPs, mitochondrial genomes, and genomes of symbiotic bacteriumL. polyplacis . Our hypothesis (visualized in the Figure1 c, d) predicts that for both maternally inherited markers (mitochondrial and symbiotic genomes), the inter-lineage divergence (i.e. SE vs. SW) will be much higher than any divergence within the lineages. This reflects the hypothetical scenario of two lineages long isolated during they stay in two different refugia and also during most of the recolonization journey. On the contrary, since the hypothesis presumes rare hybridization, no such clear distinction should be found in nuclear SNPs. To visualize and quantify diversification of maternally inherited markers, we reconstructed genealogical trees and calculated distance matrices for mitochondrial and symbiotic genomes. To visualize nuclear diversification using SNPs, we analyzed population structure using PCA and reconstructed genealogical relationships by building a maximum likelihood tree. Below, we provide details on processing the three sources of the data.