3 Case study
In order to show the performance of MeStudio, a recently published SMRT dataset was used (diCenzo et al. , 2022) comparing some of the methylation features of two Sinorhizobium meliloti strains, 1021 and FSM-MA, grown until stationary phase in minimal medium (Table 2, Figure 2B) (diCenzo et al. , 2022). On the SMRT assembled reads of the genomes of the two strains, MeStudio was able to identify a total of 28 motifs (Table 2). All but six motifs (namely AGAAAAT, DCTGCAGGS, RAGCWGCTY, RAGCWGCTY, RCTGCAGGS, TGGGCA) were common to both strains. The number of retrieved methylated sites ranged from a few units (especially for private motifs, those present in one strain only) to several thousands (as GANTC, which is a classical motif methylated by the CcrM DNA methylase and its involved in cell cycle regulation (Mouammine and Collier, 2018). CDS and nCDS showed similar values, as expected for methylation being present on both DNA strands. Intergenic sequences (tIG) showed the lowest number of methylated sites, while upstream sequences to a gene (UP), bona fide corresponding to putative promoter regions reported values generally one order of magnitude higher than tIG and in some cases differences in values between strains ranged around two-fold (e.g., CTYCCAG and GCCAGG). Finally, the presence of motifs in one strain only, may suggest the occurrence of strain-specific Restriction-Modification systems, though the small number of methylated sites may also suggest alternative hypotheses (i.e., methylation on some genomic regions only related to regulation of expression at specific loci). Demo files for input and output are available athttps://github.com/combogenomics/MeStudio.

4 Discussion

We have reported here the description of a novel software (MeStudio) for the analysis of DNA methylation profiles obtained by single molecule real time sequencing. MeStudio has several novel and useful features compared to the few existing tools, as it provides outputs in the form of GFF and BED files which contain information on the position of methylated sites and methylated motifs, the number of methylated sites and profiles for each genomic feature and graphical outputs. The genomic features analysed include genic and intergenic regions (hence comprising putative promoters), allowing the formulation of hypotheses related to the importance of DNA methylation on regulation of gene expression and on other relevant biological phenomena. Besides being developed for prokaryotic genomes, MeStudio can handle any kind of sequence, by simply providing a suitable set of input files (Figure 1). By providing information on motif occurrence and genomic localization, MeStudio provides the basis for comparative analyses of DNA methylation profiles among strains, in terms of evolutionary studies on populations and species and epigenomic modifications during adaptation and development.
Finally, MeStudio is very user friendly given its easy installation and its possibility to be run as a pipeline, in a single command line call. We’ve developed the scripts in a Mac and Linux kernel environments, with the possibility in the near future to expand to Windows platforms as well.