Phylogenomic studies in L. cidri
To initiate the study of the phylogeny and the genomic variations inL. cidri , we estimated the ploidy levels of the isolates. FACs analysis revealed that all L. cidri isolates were haploids (Fig. S2). Subsequently, we sequenced the complete genomes of 55 strains (30 from South America and 25 from Australia), and incorporated previously-published data for the reference strain L. cidriCBS2950 (isolated from cider in France) and the L. fermentatistrain CBS707, which we used as an outgroup. On average, across the 30 South American genomes, we obtained 2,670 SNPs per strain relative to the reference genome (SNPs were found on average every ~ 3.8 kb between two strains). On the other hand, across the 25 Australian genomes, we obtained on average 36 SNPs per strain relative to the reference genome (SNPs were found on average every ~ 282 kb), indicating apparent differences compared to the South American group of strains. In parallel, we found different numbers of insertions and deletions depending on the strain relative to the reference genome, ranging from 79 in the Australian strains to 124 in the South American strains (the Bioinformatic Summary statistics are shown in Table S6). Interestingly, the high number of INDELs are unique to the reference strain, rather than a general trend between any two strains (Table S6).
The phylogeographic result with maximum-likelihood phylogeny revealed a topology with two well-supported main clades separating South American and Australian strains (Fig. 1b). Australian strains clustered into a single clade (hereafter referred to as Aus), together with the European reference strain L. cidri CBS2950. This clustering of Australian and European strains, together with the low number of polymorphisms found between them (Table S7), suggest a recent migration event between both regions. In contrast, the South American strains were separated in a more complex clade distribution, with substantial differences in branch lengths, as well as different subclades with phylogeographic structures (Fig. 1b). The South American clade (hereafter referred to as SoAm) harbored 15,706 unique SNPs compared to the reference strain (Table S7). Interestingly, the two isolates from Altos de Lircay National Park (AL) (the northernmost locality) (Fig. 1a) cluster together, and harbor the greatest genetic divergence within the SoAm group (Fig 1b). Overall, these results suggest a broader genetic diversity in SoAm compared to Aus lineages, where the AL branch showed the highest genetic divergence in the latitudinal gradient (Fig. 1b).