2.4 | Sequence analyses
First, sequence quality filtering and adapter removal were performed using Cutadapt (Martin, 2011: cutadapt -q 10 -a AGATCGGAAGAGC –minimum-length 100). Next, sequence sub-sampling was performed to equalize the sequence reads for all samples (Li, 2021: seqtk sample -s100), with 370,000 sequences from each sample. Next, the transcripts were assembled using Mira 4.9.6 (Chevreux et al. 1999). The manifest file of the assembler is available as supplemental material (Methods S1). The assembled contigs were assigned to one of the thirteen mitochondrial proteins, either small or large mitochondrial rRNA. For the gene assignment, BLAST+ (Camacho et al. 2009) in conjunction with MIDORI (Machida et al. 2017; Leray et al., 2022) and SILVA (Quast et al. 2013) references were performed. SILVA was included as a reference to count nuclear rRNA sequence contaminations, although no nuclear rRNA sequences were detected in our libraries. Commands for the BLAST database construction and the execution of BLAST searches are available as supplemental material (Methods S2). Following BLAST homology searches, the contigs were assigned to each target gene using four criteria. Queries were removed when (i) the top-listed hit was not the target gene, (ii) three or more genes were listed as hits, (iii) only one subject remained in the BLAST result and (iv) we removed the listed subjects from the BLAST results when the e-value exceeded 1e-4. The Perl program for this assignment is available as supplemental material (Methods S3). The target mitochondrial gene contigs were selected from the Mira 4 assembled fasta using the following command: [awk ’NR==FNR{a[$0];next} $1.00 in a{c=2} c&&c–’ ’Perl_list.txt’ ’Mira.unpadded.fasta’]. ’Perl_list.txt’ is one of the output files from the above Perl program.