Performance comparison
To highlight SEGUL performance, we compared features of the command line version SEGUL v0.19.0 to AMAS v1.02 (see Borowiec 2016), the fastest alternative application and the most comparable with SEGUL. We test the applications under different settings to see how performance differs between similar features of the programs. For example, we test AMAS using “–check-align” command to make it more comparable with SEGUL that checks the alignment by default. We include goalign v0.3.5 for the alignment concatenation performance test.
We recorded CPU core usages to see how they impact the performance of the programs. SEGUL detects available CPU cores and uses them according to the current workload. AMAS and goalign, on the other hand, have settings for the core counts. We set AMAS to use all available cores in comparisons with SEGUL. Like SEGUL, the actual number of cores being used by AMAS would depend on workload, and to some extent, on the multithreading algorithm implemented by the programming language.
Our early tests showed the optimal performance for all applications was in the Linux operating system (Figure S1). Thus, we ran all our subsequent performance tests on Linux. We used six published genomic datasets (four DNA sequence datasets and two amino acid sequence datasets (AA)) with a range of taxon, site, and character counts (Table 2). We downloaded the datasets either directly from the original sources or using BenchmarkAlignments scripts (https://github.com/roblanf/BenchmarkAlignments). To simplify visualization, we transformed the execution time using natural log (Figure 1).
Except for alignment concatenation, SEGUL was consistently faster than AMAS (Table 3, Figure 1). The starkest difference is for summary statistic calculations. SEGUL is 51.7 times faster than AMAS, despite producing more statistics. AMAS was noticeably slow when using the “–check-align” setting. For example, on average across all datasets, AMAS “–check-align” was 53.4 times slower than SEGUL for alignment concatenation, whereas without the “–check-align” setting, it is 1.2 times faster. SEGUL was 2.2 times faster than goalign for alignment concatenation.
Regardless of the compared features, SEGUL used less RAM than AMAS. Across all tested datasets, the starkest contrast is when removing sequences. On average, SEGUL used 0.05 of the RAM that AMAS used. To our surprise, goalign, written in the compiled programming language, Go, used 1.4 times more RAM than AMAS, which was written in Python, when concatenating alignments.
CPU usages varied across tested features. For alignment concatenation, when AMAS was faster than SEGUL, AMAS used five times more cores than SEGUL, while providing 1.2 speed up. For the summary statistics, SEGUL used five times more cores than AMAS, while providing 51 times speed up. Other feature comparisons showed nearly equal CPU usages. The multi core settings of goalign is slower than the single core settings.
Table 2. Dataset sources and alignment statistics.