Introduction
The complex relationship between genes and behaviour has fuelled a large body of recent research and we now know that gene activity can influence brain function, which in turn may affect behaviour . Several studies have shown that behavioural states (distinct and well-characterised behaviours such as foraging or defensive behaviour) can be associated with distinct gene expression profiles in neural tissue, representing the basis for a neurogenomic approach: e.g., large gene networks have been associated with foraging and defence behaviour in honeybees , and numerous candidate neurological genes have been linked to aggression in a variety of organisms, including honeybees and zebrafish Nonetheless, most studies have focused on behavioural states that are long lasting or inherent to a species (Zayed & Robinson, 2012), whereas more plastic and transient social interactions among members of the same species (or colony) have been less characterised at the neurogenomic level . This is likely due to the challenges associated with combining accurate behavioural observations with complex experimental designs to obtain and analyse large sets of gene expression data .
The Western honeybee Apis mellifera has become a model organism for neurogenomics due to its fascinating sociobiology, the ecosystem services it provides as a pollinator and the availability of a fully annotated genome . Honeybees display perhaps one of the most iconic social behaviours in the animal world – the “waggle dance”– where foragers communicate the location of suitable food sources and possible nest locations to nestmates via stereotyped movements . This complex behaviour was described for the first time in in the last century by the Nobel Prize winner ethologist Karl Von Frisch and since then many details of its ecological, evolutionary, and physiological underpinnings have been characterized (reviewed in . Despite this, we still do not have a complete picture of how the waggle dance is regulated at the brain level. Pioneering studies have started to reveal some of the key players at the levels of molecules , cell types and genetic pathways associated with dance communication, but it is unclear what genes in the honeybee brain trigger the performance of dance behaviour once activated.
Traditionally the neurogenomic approach has consisted of using statistical methods to calculate differential gene expression , which requires robust data analysis techniques due to the large volumes of sequence reads generated per sample . An interesting development in the field to address the increased computational needs of these approaches has been the application of Machine Learning (ML) to genomics studies . ML methodologies have proved to be powerful resources for this purpose and have been the focus of extensive research recently to identify the possibilities of new applications to a wide range of fields in biology and medicine . Despite the abundance of studies applying ML frameworks to transcriptomic data, its use to characterise the molecular regulation of highly plastic and transient behaviours has not yet been properly explored.
In this study, we set out to identify the genes associated with the performance of dance behaviour in honeybee foragers using a ML approach. We obtained a transcriptomic dataset of brain tissues (mushroom bodies) from honeybee foragers that were sampled for another study designed to underpin the molecular basis for learning distance and direction through the waggle dance (Manfredini et al in prep .): mushroom bodies were targeted for this study as they are the best suited brain tissue to explore high cognitive functions in insects , including spatial tasks . We trained three types of ML models (classifiers) on the activation levels of 15,314 genes that correspond to the whole honeybee genome, with the direct goal of classifying honeybees according to whether or not they performed a waggle dance upon their return from a foraging trip (i.e., dancers vs non-dancers). Thereafter, we unified the information obtained from the different ML approaches to identify the genes associated with these complex behavioural states, and we compared these results with more traditional analyses of gene expression. Together, our study provides a deeper insight into the molecular regulations of the waggle dance, a plastic and transient behavioural state, and promotes incorporating ML in the analysis of transcriptomic data.