Biodiversity studies greatly benefit from molecular tools, such as DNA metabarcoding, which provides an effective identification tool in biomonitoring and conservation programmes. The accuracy of species-level assignment, and consequent taxonomic coverage, relies on comprehensive DNA barcode reference libraries. The role of these libraries is to support species identification, but accidental errors in the generation of the barcodes may compromise their accuracy. Here we present an R-based application, BAGS (Barcode, Audit & Grade System), that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, for a given taxonomic group of animals, available in the Barcode of Life Data System (BOLD). This is followed by implementing a qualitative ranking system that assigns one of five grades (A to E) to each species in the reference library, according to the attributes of the data and congruency of species names with sequences clustered in Barcode Index Numbers (BINs). Our ultimate goal is to allow researchers to obtain the most useful and reliable data, highlighting and segregating records according to their congruency. Different tests were performed to perceive its usefulness and limitations. BAGS fulfils a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS have the potential to become a valuable addition in forthcoming DNA metabarcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries.
Microsporidia are obligate intracellular eukaryotic parasites that infect nearly all animal groups, including humans. The most common molecular methods for Microsporidia detection rely on species-targeting qPCR or end-point PCR using group-specific primers. However, these methods could be not specific enough or fail in case of mixed infections. We developed a method for parallel detection of both microsporidian infection and the host species. We designed new primer sets: one specific for the classical Microsporidia (targeting hypervariable V5 region of ssu rDNA), and a second one targeting a shortened fragment of the COI gene (standard metazoan DNA-barcode); both markers are well suited for a NGS approach. The analysis of ssu rDNA dataset representing 607 microsporidian species (120 genera) indicated that the V5 region enables identification of >98% species in the dataset (596/607). To test the method, we used microsporidians that infect mosquitoes in natural populations. Using mini-COI data, all field-collected mosquitoes were unambiguously assigned to seven species; among them almost 60% of specimens (127/212) were positive for at least 11 different microsporidian species, including a new microsporidian ssu rDNA sequence (Microsporidium sp. PL01). Phylogenetic analysis of Microsporidium sp. PL01 ssu rDNA showed that this species belongs to one of the two main clades in the Terresporidia. In addition, the level of microsporidian mixed infections was relatively high (9.4%). The numbers of sequence reads for the OTUs suggest that the occurrence of Nosema spp. in co-infections could benefit them; however, this observation should be re-tested using more intensive host sampling. The proposed method for detection of Microsporidia can be applied to all types of DNA extracts, including medical and environmental samples.
Samia ricini, a gigantic saturniid moth, has the potential to be a novel lepidopteran model species. Since S. ricini is much more tough and resistant to diseases than the current model species Bombyx mori, the former can be easily reared compared to the latter. In addition, genetic resources available for S. ricini rival or even exceed those for B. mori: at least 26 eco-races of S. ricini are reported and S. ricini can hybridise with wild Samia species, which are distributed throughout Asian countries, and produce fertile progenies. Physiological traits such as food preference, integument colour, larval spot pattern, etc. are different between S. ricini and wild Samia species so that those traits can be the target for forward genetic analysis. In order to facilitate genetic research in S. ricini, we determined the whole genome sequence of S. ricini. The assembled genome of S. ricini was 458 Mb with 155 scaffolds, and the N50 length of the assembly was approximately 21 Mb. 16,702 protein coding genes were predicted in the assembly. Although the gene repertoire of S. ricini was not so different from that of B. mori, some genes, such as chorion genes and fibroin genes, seemed to have specifically evolved in S. ricini.
Measuring biological diversity is a crucial but difficult undertaking, as exemplified in oaks where complex morphological, ecological, biogeographic and genetic differentiation patterns collide with traditional taxonomy that measures biodiversity in number of species (or higher taxa). In this pilot study, we generated High-Throughput Sequencing (HTS) amplicon data of the intergenic spacer of the 5S nuclear ribosomal DNA cistron (5S-IGS) in oaks, using six mock samples that differ in geographic origin, species composition, and pool complexity. The potential of the marker for automated geno-taxonomy applications was assessed using a reference dataset of 1770 5S-IGS cloned sequences, covering the entire taxonomic breadth and distribution range of western Eurasian Quercus, and applying similarity (BLAST) and evolutionary approaches (ML trees and EPA). Both methods performed equally well, with correct identification of species in sections Ilex and Cerris in the pure and mixed samples and main genotypes shared by species of sect. Quercus. Application of different cut-off thresholds revealed that medium-high abundance sequences (>10 or 25) suffice for a net species identification of samples containing one or few individuals. Lower thresholds identify phylogenetic correspondence with all target species in highly mixed samples (analogue to environmental bulk samples) and include rare variants pointing towards reticulation, incomplete lineage sorting, pseudogenic 5S units, and in-situ (natural) contamination. Our pipeline is highly promising for future assessments of intra-specific and inter-population diversity, and of the genetic resources of natural ecosystems, which are fundamental to empower fast and solid biodiversity conservation programs worldwide.
The leopard coral grouper, Plectropomus leopardus, belonging to genus Plectropomus, family Epinephelinae, is a carnivorous coral reef fish widely distributing in the tropical and subtropical water of Indo-Pacific Oceans. Due to its appealing body appearance and delicious taste, P. leopardus has become a popular commercial fish for aquaculture in many countries. However, the lack of genomic and molecular resources for P. leopardus hinders its biological studies and genomic breeding programs. Here we report the de novo sequencing and assembly of P. leopardus genome using 10× Genomics and high-throughput chromosome conformation capture (Hi-C) technologies. Using 127.36 Gb 10× Genomics we generated a 902.90 Mb genome assembly with a contig and scaffold N50 of 31.8 Kb and 33.47 Mb, respectively. The scaffolds were clustered and oriented into 24 pseudo-chromosomes with 13.39 Gb valid Hi-C data. BUSCO analysis showed that 95.3% of the conserved single-copy genes were retrieved, indicating a good entirety of the assembly. We predicted 23,234 protein-coding genes, among which 96.5% were functional annotated. The P. leopardus genome provides a valuable genomic resource for genetics, evolutionary and biological studies of this species. Particularly, it is expected to benefit the development of genomic breeding programs in the farming industry.
Gene annotation is a critical bottleneck in genomic research, especially for the comprehensive study of very large gene families in the genomes of non-model organisms. Despite the recent progress in automatic methods, state-of-the-art tools used for this task often produce inaccurate annotations, such as fused, chimeric, partial or even completely absent gene models for many family copies, errors that require considerable extra efforts to be corrected. Here we present BITACORA, a bioinformatics solution that integrates popular sequence similarity-based search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly in genomic DNA sequences. We tested the performance of BITACORA in annotating the members of two chemosensory gene families with different repertoire size in seven available genome sequences, and compared its performance with that of Augustus-PPX, a tool also designed to improve automatic annotations using a sequence similarity-based approach. Despite the relatively high fragmentation of some of these drafts, BITACORA was able to improve the annotation of many members of these families and detected thousands of new chemoreceptors encoded in genome sequences. The program creates general feature format (GFF) files, with both curated and newly identified gene models, and FASTA files with the predicted proteins. These outputs can be easily integrated in genomic annotation editors, greatly facilitating subsequent manual annotation and downstream evolutionary analyses.
Sarcophaga peregrina is usually considered to be of great ecological, medical and forensic significance, and has the biological characteristics such as the ovoviviparous reproductive pattern and adaptation to feed on carrion. However, the underlying mechanisms still remain unsolved by lack of high-quality genome. Here we present de novo–assembled genome at chromosome-scale for S. peregrina. The final assembled genome was 560.31 Mb with contig N50 of 3.84 Mb. Hi-C scaffolding reliably anchored six pseudochromosomes, accounting for 97.76% of the assembled genome. Moreover, 45.70% of repeat elements were identified in the genome. A total of 14,476 protein-coding genes were functionally annotated, accounting for 92.14% of all predicted genes. Phylogenetic analysis indicated that S. peregrina and S. bullata diverged ~7.14 Mya. Comparative genomic analysis revealed expanded and positively selected genes related to biological features that aid in clarifying its ovoviviparous reproduction and necrophagous habit, such as horionic membrane formation and Dorso-ventral axis formation, lipid metabolism, and olfactory receptor activity. This study provides a valuable genomic resource of S. peregrina, and sheds insight into further revealing the underlying molecular mechanisms of adaptive evolution.