Sanger sequencing data analysis
Sanger sequence data were evaluated, edited and aligned using Geneious
Prime 2021.2.2 (Biomatters Limited, Auckland, New Zealand) and 4Peaks
sequence viewer (Nucleobytes B.V., Aalsmeer, the Netherlands). We
compared sequence data to respective orthologs available in GenBank
using BLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi).Treponema sequence data were analysed for positive gene selection
following the tools and algorithm described by Maděránková et al
(2019). Briefly, positively
selected sites were determined from sequence alignments using: (i) a
codon-based Site model implemented in EasyCodeML package (Gao et al.,
2019) and/or (ii) a mixed effects model of evolution (MEME) using
hypothesis testing approach via the Datamonkey webserver (Murrell et
al., 2012; Weaver et al., 2018). For CodeML analysis, the phylogenetic
trees were constructed using RAxML-NG tool (Kozlov et al., 2019).
Phylogenetic trees and networks were constructed with IQ-TREE 2.0.7
(Minh et al., 2020), Mr. Bayes 3.2.7 (Ronquist et al., 2012) and the
minimum spanning trees were inferred using MSTree V2 algorithm within
GrapeTree program (Zhou et al., 2018).
Maximum-likelihood trees in IQ-TREE were constructed with 1,000
ultrafast bootstrap replicates (Hoang et al., 2018) and the best-fit
model as obtained by IQ-Tree’s ModelFinder (Kalyaanamoorthy et al.,
2017) according to the Bayesian Information Criterion (BIC). Tree
reconstructions based on Bayesian inference in MrBayes were conducted
with 1,000,000 generations with sampling every 100 generations and a
burn-in of 25%. To check for convergence of all parameters and adequacy
of the burn-in, we investigated the uncorrected potential scale
reduction factor (PSRF) (Gelman and Rubin, 1992) as calculated by
MrBayes. We used T. pallidum subsp. endemicum strain Iraq
B (GenBank CP032303.1) as an outgroup to root the tree.