2.4 | Sequence analyses
First, sequence quality filtering and adapter removal were performed
using Cutadapt (Martin, 2011: cutadapt -q 10 -a AGATCGGAAGAGC
–minimum-length 100). Next, sequence sub-sampling was performed to
equalize the sequence reads for all samples (Li, 2021: seqtk sample
-s100), with 370,000 sequences from each sample. Next, the transcripts
were assembled using Mira 4.9.6 (Chevreux et al. 1999). The manifest
file of the assembler is available as supplemental material (Methods
S1). The assembled contigs were assigned to one of the thirteen
mitochondrial proteins, either small or large mitochondrial rRNA. For
the gene assignment, BLAST+ (Camacho et al. 2009) in conjunction with
MIDORI (Machida et al. 2017; Leray et al., 2022) and SILVA (Quast et al.
2013) references were performed. SILVA was included as a reference to
count nuclear rRNA sequence contaminations, although no nuclear rRNA
sequences were detected in our libraries. Commands for the BLAST
database construction and the execution of BLAST searches are available
as supplemental material (Methods S2). Following BLAST homology
searches, the contigs were assigned to each target gene using four
criteria. Queries were removed when (i) the top-listed hit was not the
target gene, (ii) three or more genes were listed as hits, (iii) only
one subject remained in the BLAST result and (iv) we removed the listed
subjects from the BLAST results when the e-value exceeded 1e-4. The Perl
program for this assignment is available as supplemental material
(Methods S3). The target mitochondrial gene contigs were selected from
the Mira 4 assembled fasta using the following command: [awk
’NR==FNR{a[$0];next} $1.00 in a{c=2} c&&c–’
’Perl_list.txt’ ’Mira.unpadded.fasta’]. ’Perl_list.txt’ is one of
the output files from the above Perl program.