2 Design and implementation
MeStudio consists of several tools that can be run individually or as
part of a pipeline and uses a naive string matching algorithm to map
motif sequences to the reference genome. The required input data consist
in only three files: i) a FASTA file containing the genome sequence, ii)
a genomic annotation file in GFF3 format and iii) another GFF3
containing the methylated nucleotide positions. The latter is
automatically generated from the output of the SMRTlink software of
Pacific Biosciences DNA sequencers. As a result, MeStudio produces
several files including: (i) a text file with summarized statistics
concerning the methylation occurrences along the genomic features, (ii)
distribution plots and, (iii) BED files containing protein annotation of
the genes in which methylated motifs have been found. A workflow is
provided in Figure 1.