Comparison of telomere length estimates from WGS data
On average, significant differences were observed across bioinformatic
approaches that measured telomere length (H = 230.06, df = 2,p < 0.001, Figure 2). However, despite these
differences, individual genotype measures were highly correlated across
approaches (r = 0.86 – 0.99). This suggests that regardless of
the approach, estimates are comparable, but the scale of estimates
differs on average. Variation in telomere length estimates may be
attributed to differences in the bioinformatic approaches, including
telomere identification (i.e., alignment or matching pattern approach),
minimum number of consecutive telomeric repeats required in a read, and
consideration of genome coverage. Computel was initially designed to
estimate mean telomere length in humans but allows species-specific
modification of genome features, including genome size, number of
chromosomes, and telomere sequence to allow estimation across organisms
. Here, we leveraged Computel to estimate telomere length in plants.
Computel uses an alignment-based method by mapping reads from WGS data
to a telomere reference created within the program. Only reads that
align with the telomere reference are considered telomeric reads. In
contrast, K-seek was not developed to estimate telomere length but was
created to identify and count simple sequence repeats from WGS inDrosophila . We leveraged K-seek to estimate telomere length by
identifying and counting the number of predicted Populus telomere
repeats from WGS within each genotype. K-seek considers short repeats
with a minimum repeating length of 50 bp within a read so that reads
containing a minimum of seven telomeric repeats were identified in the
analysis. This approach decreases the probability of capturing
interstitial telomeres, which are telomeric repeats localized to
intrachromosomal sites . However, unlike Computel, K-seek does not
include other parameters that are known to influence telomere estimates
such as genome coverage . Similarly, TRIP identifies short tandem repeat
sequences from WGS, but this program was specifically created for
telomere identification in insects . TRIP detects reads with more than
four telomeric repeats per read, and like K-seek, does not consider
genome coverage in telomere length estimates. Including genome coverage
can influence telomere length estimates on average by reducing potential
sequencing biases . Nersisyan and Arakelyan (2015) compared human
telomere length using short read sequence data with varying degrees of
coverage across the same individuals (0.2, 2 and 10x). They observed
that the accuracy of telomere estimates improved with higher genome
coverage. In our study, we removed one individual from the analyses as
an outlier as it exhibited low genome coverage (< 12.31X,
Table S1). Individuals assessed in our study had a minimum genome
coverage of 15X, suggesting that this may be reasonable requirement to
precisely estimate telomere length using WGS. (Table S1). However,
further studies comparing the impact of varying genome coverage in
telomere estimations are needed to support this recommendation in
plants. Therefore, while there may be benefits to the matching approach
used in K-seek and TRIP, ensuring that genome coverage is considered
will be essential for future comparisons.