Introduction
The complex relationship between genes and behaviour has fuelled a large
body of recent research and we now know that gene activity can influence
brain function, which in turn may affect behaviour . Several studies
have shown that behavioural states (distinct and well-characterised
behaviours such as foraging or defensive behaviour) can be associated
with distinct gene expression profiles in neural tissue, representing
the basis for a neurogenomic approach: e.g., large gene networks have
been associated with foraging and defence behaviour in honeybees , and
numerous candidate neurological genes have been linked to aggression in
a variety of organisms, including honeybees and zebrafish Nonetheless,
most studies have focused on behavioural states that are long lasting or
inherent to a species (Zayed & Robinson, 2012), whereas more plastic
and transient social interactions among members of the same species (or
colony) have been less characterised at the neurogenomic level . This is
likely due to the challenges associated with combining accurate
behavioural observations with complex experimental designs to obtain and
analyse large sets of gene expression data .
The Western honeybee Apis mellifera has become a model organism
for neurogenomics due to its fascinating sociobiology, the ecosystem
services it provides as a pollinator and the availability of a fully
annotated genome . Honeybees display perhaps one of the most iconic
social behaviours in the animal world – the “waggle dance”– where
foragers communicate the location of suitable food sources and possible
nest locations to nestmates via stereotyped movements . This complex
behaviour was described for the first time in in the last century by the
Nobel Prize winner ethologist Karl Von Frisch and since then many
details of its ecological, evolutionary, and physiological underpinnings
have been characterized (reviewed in . Despite this, we still do not
have a complete picture of how the waggle dance is regulated at the
brain level. Pioneering studies have started to reveal some of the key
players at the levels of molecules , cell types and genetic pathways
associated with dance communication, but it is unclear what genes in the
honeybee brain trigger the performance of dance behaviour once
activated.
Traditionally the neurogenomic approach has consisted of using
statistical methods to calculate differential gene expression , which
requires robust data analysis techniques due to the large volumes of
sequence reads generated per sample . An interesting development in the
field to address the increased computational needs of these approaches
has been the application of Machine Learning (ML) to genomics studies .
ML methodologies have proved to be powerful resources for this purpose
and have been the focus of extensive research recently to identify the
possibilities of new applications to a wide range of fields in biology
and medicine . Despite the abundance of studies applying ML frameworks
to transcriptomic data, its use to characterise the molecular regulation
of highly plastic and transient behaviours has not yet been properly
explored.
In this study, we set out to identify the genes associated with the
performance of dance behaviour in honeybee foragers using a ML approach.
We obtained a transcriptomic dataset of brain tissues (mushroom bodies)
from honeybee foragers that were sampled for another study designed to
underpin the molecular basis for learning distance and direction through
the waggle dance (Manfredini et al in prep .): mushroom bodies
were targeted for this study as they are the best suited brain tissue to
explore high cognitive functions in insects , including spatial tasks .
We trained three types of ML models (classifiers) on the activation
levels of 15,314 genes that correspond to the whole honeybee genome,
with the direct goal of classifying honeybees according to whether or
not they performed a waggle dance upon their return from a foraging trip
(i.e., dancers vs non-dancers). Thereafter, we unified the information
obtained from the different ML approaches to identify the genes
associated with these complex behavioural states, and we compared these
results with more traditional analyses of gene expression. Together, our
study provides a deeper insight into the molecular regulations of the
waggle dance, a plastic and transient behavioural state, and promotes
incorporating ML in the analysis of transcriptomic data.