Feature comparison
We compare SEGUL to goalign (Lemoine & Gascuel, 2021) and AMAS (Borowiec, 2016) because they share similar performance and use cases. goalign is the most feature-rich (Table 1), however, most of its functions operate on a single file; two exceptions are alignment concatenation (multiple files) and sequence alignment (two files). AMAS and SEGUL, on the other hand, operate on single or multiple files without extra effort from the user. All applications provide APIs, but neither AMAS nor goalign offers a GUI. Several features (e.g., sample mapping, sequence unique ID parsing, and partition format conversion) are unique to SEGUL.
While several SEGUL features overlap with AMAS and goalign, SEGUL provides more functionalities. For instance, SEGUL generates summary statistics for alignment files, raw read sequences, and contiguous sequence, whereas AMAS and goalign only support alignment files. The application raw summary statistics provides a simple version of the statistics generated by FastQC (Andrew, 2010). It outputs csv files and is designed to compare many raw read sequences quickly without an additional application (e.g., multiQC to summarize FastQC results). For alignment files, SEGUL always checks that the sequences within each alignment are the same length and by default checking for the sequence containing valid IUPAC characters. To speed up processing for most features, users can skip the IUPAC checking using the “—ignore-datatype” option, but alignment length is always checked. AMAS does not check for IUPAC characters and checking the sequence alignment is optional. goalign checks both but does not generate alignment partitions. Another example, the SEGUL sequence removal feature supports regular expression, file, and terminal input, while AMAS supports only terminal input. As noted above, the goalign sequence removal works on only a single file.
Table 1. Feature comparison among SEGUL v0.19.0, AMAS v1.02, and goalign v0.3.5.