Feature comparison
We compare SEGUL to goalign (Lemoine & Gascuel, 2021) and AMAS
(Borowiec, 2016) because they share similar performance and use cases.
goalign is the most feature-rich (Table 1), however, most of its
functions operate on a single file; two exceptions are alignment
concatenation (multiple files) and sequence alignment (two files). AMAS
and SEGUL, on the other hand, operate on single or multiple files
without extra effort from the user. All applications provide APIs, but
neither AMAS nor goalign offers a GUI. Several features (e.g., sample
mapping, sequence unique ID parsing, and partition format conversion)
are unique to SEGUL.
While several SEGUL features overlap with AMAS and goalign, SEGUL
provides more functionalities. For instance, SEGUL generates summary
statistics for alignment files, raw read sequences, and contiguous
sequence, whereas AMAS and goalign only support alignment files. The
application raw summary statistics provides a simple version of the
statistics generated by FastQC (Andrew, 2010). It outputs csv files and
is designed to compare many raw read sequences quickly without an
additional application (e.g., multiQC to summarize FastQC results). For
alignment files, SEGUL always checks that the sequences within each
alignment are the same length and by default checking for the sequence
containing valid IUPAC characters. To speed up processing for
most features, users can skip the IUPAC checking using the
“—ignore-datatype” option, but alignment length is always checked.
AMAS does not check for IUPAC characters and checking the
sequence alignment is optional. goalign checks both but does not
generate alignment partitions. Another example, the SEGUL sequence
removal feature supports regular expression, file, and terminal input,
while AMAS supports only terminal input. As noted above, the goalign
sequence removal works on only a single file.
Table 1. Feature comparison among SEGUL v0.19.0, AMAS v1.02, and goalign
v0.3.5.