Genome assembly, gene function annotation and phylogenetic
analyses
A total of 7,225,123 raw Illumina sequences were obtained, filtered, and
cleaned using the trim_galore program v0.6.7
(https://github.com/FelixKrueger/TrimGalore). Out of these, 7,206,640
sequences, with both pairs preserved, were utilized for genome assembly.
The de novo genome assembly was conducted using the multi-platform
genome assembly pipeline (MpGAP) v3.1, employing Nextflow version
21.10.6 and Masurca version 4.0.5. To assess the quality of the
assembly, Quast version 5.0.2 and Busco version 5.4.2 were employed,
utilizing the Saccharomycetales_odb10 database. The assembly with the
best performance (highest N50 and largest contig size (Kbp)) was then
annotated using the Funannotate pipeline version 1.8.14. Contigs smaller
than 500 base pairs were removed, and repeated sequences were masked
using default settings. Gene prediction was carried out ab initio using
’Augustus’, ’HiQ’, ’GlimmerHMM’, ’snap’, and ’GeneMark’. The prediction
of t-RNAs was performed using tRNAscan-SE v2.0.9, a program included in
Funannotate. For annotation and functional prediction of genes we
utilized InterProScan. KEGG and KofamKOALA web server were used to
predict the gene functions of N. atacamensis ATA-11A-B. The
average nucleotide identity (ANI) between Nakazawaea genomes was
estimated from different available assemblies using OrthoANI (Leeet al. , 2016).
A maximum-likelihood phylogenetic tree (ML) was constructed using a
protein sequence predicted from N. atacamensis and other fourNakazawaea species (N. peltata , N. holstii ,N. ishiwadae , and N. ambrosiae ). The yeast speciesP. tannophilus was used as an outgroup. Ortho-Finder v2.4.1 was
employed to identify orthologous protein groups among these six
different species. Subsequently, a total of 2,422 single-copy orthologs
were identified in all species and aligned using Muscle v3.8.15 (Edgar,
2004). Alignments were concatenated to produce a maximum-likelihood tree
with RAxML v8.2.12 (-f a -x 12345 -p 12345 -# 100 -m PROTGAMMAJTT -k).
The phylogenetic tree was visualized and plotted using iTOL v5.