Performance comparison
To highlight SEGUL performance, we compared features of the command line
version SEGUL v0.19.0 to AMAS v1.02
(see
Borowiec 2016), the fastest alternative application and the most
comparable with SEGUL. We test the applications under different settings
to see how performance differs between similar features of the programs.
For example, we test AMAS using “–check-align” command to make it
more comparable with SEGUL that checks the alignment by default. We
include goalign v0.3.5 for the alignment concatenation performance test.
We recorded CPU core usages to see how they impact the performance of
the programs. SEGUL detects available CPU cores and uses them according
to the current workload. AMAS and goalign, on the other hand, have
settings for the core counts. We set AMAS to use all available cores in
comparisons with SEGUL. Like SEGUL, the actual number of cores being
used by AMAS would depend on workload, and to some extent, on the
multithreading algorithm implemented by the programming language.
Our early tests showed the optimal performance for all applications was
in the Linux operating system (Figure S1). Thus, we ran all our
subsequent performance tests on Linux. We used six published genomic
datasets (four DNA sequence datasets and two amino acid sequence
datasets (AA)) with a range of taxon, site, and character counts (Table
2). We downloaded the datasets either directly from the original sources
or using BenchmarkAlignments scripts
(https://github.com/roblanf/BenchmarkAlignments).
To simplify visualization, we transformed the execution time using
natural log (Figure 1).
Except for alignment concatenation, SEGUL was consistently faster than
AMAS (Table 3, Figure 1). The starkest difference is for summary
statistic calculations. SEGUL is 51.7 times faster than AMAS, despite
producing more statistics. AMAS was noticeably slow when using the
“–check-align” setting. For example, on average across all
datasets, AMAS “–check-align” was 53.4 times slower than SEGUL for
alignment concatenation, whereas without the “–check-align”
setting, it is 1.2 times faster. SEGUL was 2.2 times faster than goalign
for alignment concatenation.
Regardless of the compared features, SEGUL used less RAM than AMAS.
Across all tested datasets, the starkest contrast is when removing
sequences. On average, SEGUL used 0.05 of the RAM that AMAS used. To our
surprise, goalign, written in the compiled programming language, Go,
used 1.4 times more RAM than AMAS, which was written in Python, when
concatenating alignments.
CPU usages varied across tested features. For alignment concatenation,
when AMAS was faster than SEGUL, AMAS used five times more cores than
SEGUL, while providing 1.2 speed up. For the summary statistics, SEGUL
used five times more cores than AMAS, while providing 51 times speed up.
Other feature comparisons showed nearly equal CPU usages. The multi core
settings of goalign is slower than the single core settings.
Table 2. Dataset sources and alignment statistics.