Discussion
In the present study, we implemented a Machine Learning approach to
investigate the transcriptomic signatures arising from a complex plastic
phenotype. We explored the unique gene expression profiles of Apis
mellifera associated with dance behaviour, in order to determine the
set of focal genes that could be some of the key regulators for this
complex behaviour. Training two embedded models (SVM & GLMNET) and one
wrapper algorithm (RFE-RF), we were able to achieve perfect accuracy in
assigning honeybees to the major behavioural response that we tested
(“dancer” vs. “non-dancer”) according to gene expression data. Using
Feature Selection, we were able to obtain a set of key predictors for
each classifier, which were then distilled into a list of genes. Our
results show a set of genes that are promising candidates and could
directly regulate dance behaviour.
While we were able to clearly separate dancers from non-dancers, our
initial preliminary analyses (PCA) were unable to detect any major
effects of distance perception, which was one of the research questions
that we had initially pursued. Although we found that the impact of
distance perception on gene expression was too subtle to be detected by
our approaches, other studies have succeeded to identify the effect of
distance perception alone on honeybee brain gene expression, using more
traditional statistical tools of transcriptomic analyses . It is
possible that with an increased sample size, we would have been able to
investigate this behaviour further. Alternatively, the transcriptomic
signature associated with distance perception might be more significant
in honeybees experiencing real distance as opposed to the perceived
distance that honeybees experienced through our tunnel manipulation
setup. As a matter of fact, a larger set of genes was found to differ
between foragers experiencing real long distance vs. short distance
(Manfredini et al. in prep. ) but we cannot exclude that a portion
of these genes might have changed their patterns of expression from one
group to the other according for example to different metabolic costs of
flight.
A recurrent problem in transcriptomic analyses is that the number of
genes (predictors) is far greater than the number of samples. The fact
that we managed to obtain an overlapping subset of genes by different ML
and differential gene expression analyses in this setting indicates that
the dance behaviour has a unique and well-defined transcriptomic
signature. Moreover, our methods further prove that ML can be used as a
complementary approach to provide further support to transcriptomic
studies and help restricting the focus to a smaller group of genes that
can be investigated for their association with a behaviour of interest.
Note also that this approach can be expanded beyond honeybees and the
waggle dance; a significant portion of animal behaviours are transient
and plastic, making them very difficult to characterise, but they can be
described by combining traditional gene expression analyses and ML
approaches.
Interestingly the overlap between SVM and RFE-RF was greater than with
GLMNET. In high-dimensional settings, such as ours, there can be more
than one optimum for the learning objective, which is why ML models can
end up in different final states on subsequent runs with random
initialisation. In our case, it Is simply possible that GLMNET favoured
reaching a different state than those reached by SVM and RFE-RF, hence
selecting different features. It is also possible that since GLMNET has
built in mechanisms to avoid overfitting, its learning procedure was
slower than the other methods, and simply needed more training time.
Nevertheless, the extensive overlap between the SVM and RFE-RF
approaches, and the fact that many of the identified genes were also in
common with traditional methods to analyse gene expression shows great
promise. We hypothesise that these genes are the best predictors for the
dance behaviour, as they all appear to be expressed at higher levels in
dancers vs. non-dancers. In particular, the two genes that are in common
to all three ML approaches and were also identified by at least one
approach of gene expression analysis deserve special attention.Boss (bride of sevenless ) belongs to the group of
G-protein coupled receptors; an important family of genes often
associated to expression of behaviour in insects. In particular,boss has been associated to a set of different functions inDrosophila , including sight and eye development, energy
homeostasis and response to glucose . Boss might have been
co-opted in honeybees to regulate dance behaviour, an energetically
expensive activity that is highly related to feeding behaviour (and
therefore to sugar response) and relies on visual input for orientation
purpose during flight to a foraging site. As for heterogeneous
nuclear ribonucleoprotein A1 , studies on the Drosophilaorthologue HRP59 have revealed a role for this gene in
alternative splicing , a molecular process that consent the translation
of a single mRNA molecule into multiple protein variants , significantly
increasing the repertoire of responses to a stimulus. Even though the
role of alternative splicing in the regulation of behaviour is largely
unknown, this process has started to be characterized in multiple
organisms, including honeybees , hinting at the possibility that the
honeybee orthologue of HRP59 might contribute to the high
plasticity that is necessary to regulate a complex behaviour such as the
waggle dance.
More functional approaches are needed to move beyond correlation and
investigate whether a causal link exists between the expression levels
of the genes that we identified and the performance of dance behaviour.
If further research were to support this, the results could then be used
to test the recruitment potential in a specific colony. By designing a
diagnostic tool to directly measure the levels of expression of the
focal genes and compare them against a reference, it would be possible
to assess the overall ability of a colony at recruiting to a foraging
site through dancing.