Introduction

Lifespan is the approximate maximum age that individuals of a given species are expected to attain under favourable environmental conditions. Derivations of a species’ lifespan are varied, including the maximum recorded age of any single individual , the age to which a proportion of the population survives , or, in fish, the age at which 95 % of the maximum or asymptotic length is reached . Lifespan derived in any way is a fundamental life history parameter, allowing for approximation of mortality and rate of population growth . Lifespan can also provide an upper limit for an animal’s reproductive life phase, except in the small number of species that undergo reproductive senescence. The age at which sexual maturity is attained and either age at death or age of reproductive senescence vary more extensively than maximum lifespan, and rates of reproduction and mortality even more so . Lifespan, in contrast, is a relatively stable trait within a given species and can therefore be used to obtain generalisable information about that species .
Lifespan’s utility in approximating life history makes it valuable for species management. For example, it can be used to model sustainable harvest levels for wild populations, such as in fisheries , but also assessments of invasion potential , and extinction risk . Despite its simplicity as a population parameter, and great value for a range of animal population and species management applications, lifespan is often not considered because there are no reliable estimates available. Reported vertebrate lifespans range from eight weeks in the coral reef pygmy goby (Eviota sigillata ) to approximately 400 years in the Greenland shark (Somniosus microcephalus ) Identification of the oldest individuals of a given species is often very difficult because age information is sparse or absent. Long-lived species present a range of practical difficulties for determining lifespan, as in the absence of indirect estimation methods, research programmes rarely last as long as the oldest individuals . Thus, despite its central importance to species management and conservation, lifespan is unknown for most animals .
The ageing process is hypothesised to be an unintended consequence of cell programming, involving molecular changes that leave traceable genomic signatures . Consistent changes in a well-studied epigenetic modification, DNA methylation, can be used to predict age in a growing number of species . This is because, over the lifespan of an individual, patterns of DNA methylation change, whereby highly methylated regions become demethylated and sparsely methylated regions become methylated . Along with other important epigenetic changes, these changes in DNA methylation result in a loss of cellular functioning that is thought to contribute to processes of aging . The term DNA methylation is generally used to refer to methylation that occurs at cytosine-phosphate-guanine (CpG) sites, or ‘CG’ sequences in the genome, where its occurrence and function has been most extensively studied . CpG sites are concentrated around transcription start sites and in promoter regions of genes, where their density and DNA methylation levels are associated with changes in gene activity . The elevated frequency of CpG sites in gene promoters has been hypothesised to act as a buffer against age-related DNA methylation changes and therefore correlate with species maximum lifespan .
The association between promoter CpG density and lifespan was first revealed in mammals and its predictive value was subsequently demonstrated among all vertebrates . McLain and Faulk (2018) revealed significant correlations between promoter CpG density and mammalian lifespan for 1000 gene promoter regions; 5 % of the total examined. Mayne et al. (2019) developed a model that used the CpG densities of 42 gene promoters to predict lifespan in vertebrates, accounting for 76 % of the variation between known and predicted lifespans. The vertebrate model highlighted unique relationships between CpG density and lifespan in all major vertebrate groups, including fish, birds, mammals and reptiles. However, because the prediction accuracy was lower in non-mammalian vertebrates, these differences were attributed to low sample size (n ≤ 63) and high sequence divergence . Previous lifespan analyses have used human gene promoters as reference sequences, resulting in fewer sequence matches, greater bias and lower accuracy in distant relatives . Previous studies have also obtained lifespan information from the Animal Aging and Longevity Database (AnAge) . Although AnAge is a highly comprehensive and well curated database, incorporation of lifespan data from additional sources (e.g., alternative online databases or manual literature search) is likely to enable increased sample sizes and improve statistical power.
Fish (aquatic vertebrates with fins and gills) are a paraphyletic group including class Actinopteri (ray-finned fishes), Chondrichthyes (cartilaginous fishes), Sarcopterygii (fleshy-finned fishes), Cephalaspidomorphi (e.g., lampreys) and Myxini (e.g., hagfishes). At present, approximately 7000 fish species are subject to wild harvest, each typically requiring species-specific life history information to enable adequate fisheries management . A lack of data for the majority of fished species significantly impedes management of sustainable fisheries, with an estimated 35 % of global fish stocks now overfished . Lifespan data is of particularly high value for management of fish populations, as it can be used to approximate natural mortality rates , fisheries maximum sustainable yield and model population growth .
Here we report the development of a fish-specific genomic lifespan predictor. The model was constructed using 1804 reported lifespan values and the CpG density (measured as CpG observed/expected ratio) of promoter regions from 442 fish genomes extracted using experimentally defined zebrafish (Danio rerio ) promoter sequences. The model predicts lifespan for any given fish species from the genome sequence of a single individual, demonstrating the high value of promoter CpG density alone to predict lifespan in fish.