Phylogenetic analyses
The nucleotide sequences of 13 protein–coding genes (PCGs) and two
ribosome rRNAs (12S rRNA + 16S rRNA) and amino acids
(13
PCGs) were aligned using
MAFFT
v7.394
(Katoh
& Standley, 2013) with the highly accurate L-INS-I strategy, trimmed
using
trimAl
v1.4.1 (Capella-Gutiérrez et al., 2009) with the heuristic method
‘automated1’ to remove gap-only and ambiguous-only positions, and
concatenated using FASconCAT-G v1.04 (Kück & Longo, 2014). Finally, we
generated three matrices for the tree inference: (1) amino acids
sequence with the
13
protein-coding genes (PCGs_faa);
(2)
nucleotide sequence of 13 protein code genes with the third codon
excluded
(PCG12_fna);
(3) nucleotide sequence of PCG12_fna plus the two ribosomal RNAs
(PCG12_fna
plus two rRNAs). Third codon positions were excluded from the
nucleotide-based analyses to reduce the possibility of bias or
long-branch attraction due to
substitution
saturation among species belonging to different genera. We applied both
partitioned and non-partitioned approaches for phylogenetic inference.
Partitioned
maximum likelihood reconstructions were performed using
IQ-TREE
v1.6.3
(Nguyen
et al., 2015) with 1,000 ultrafast bootstrap (UFBoot) (Hoang et al.,
2018) and 1,000 SH‒aLRT replicates
(Guindon
et al., 2010). The option ‘-m MFP+MERGE’ was performed in all three
matrices. Non-partitioned reconstructions were made using site
heterogeneous models in both maximum likelihood (ML) and Bayesian
inference (BI). Posterior mean site frequency (PMSF) model (Wang et al.,
2018) was used for the PCGs_faa matrix by specifying a profile mixture
model with the option ‘-mtInv+C60+FO+R’ in IQ-TREE. The corresponding
partitioned tree (PCGs_faa matrix) was treated as an initial guide
tree. Bayesian inference using PhyloBayes
MPI
v1.8b (Lartillot et al., 2013) was performed for the PCGs_faa matrix as
well. Two separate chains were independently run for 10,000 generations
under the CAT+GTR model (Lartillot & Philippe, 2004) using a starting
tree derived from PMSF ML analyses. We used the program bpcomp
(maxdiff
value) and tracecomp
(minimum
effective size) to check for convergence, that is, when the maxdiff
value is smaller than 0.3 and minimum effective sizes are larger than
50. In addition, pairwise distances (p-distance) using the nucleotide
sequence of 13 PCGs are shown in
Table
S3.