Discussion
In the present study, we implemented a Machine Learning approach to investigate the transcriptomic signatures arising from a complex plastic phenotype. We explored the unique gene expression profiles of Apis mellifera associated with dance behaviour, in order to determine the set of focal genes that could be some of the key regulators for this complex behaviour. Training two embedded models (SVM & GLMNET) and one wrapper algorithm (RFE-RF), we were able to achieve perfect accuracy in assigning honeybees to the major behavioural response that we tested (“dancer” vs. “non-dancer”) according to gene expression data. Using Feature Selection, we were able to obtain a set of key predictors for each classifier, which were then distilled into a list of genes. Our results show a set of genes that are promising candidates and could directly regulate dance behaviour.
While we were able to clearly separate dancers from non-dancers, our initial preliminary analyses (PCA) were unable to detect any major effects of distance perception, which was one of the research questions that we had initially pursued. Although we found that the impact of distance perception on gene expression was too subtle to be detected by our approaches, other studies have succeeded to identify the effect of distance perception alone on honeybee brain gene expression, using more traditional statistical tools of transcriptomic analyses . It is possible that with an increased sample size, we would have been able to investigate this behaviour further. Alternatively, the transcriptomic signature associated with distance perception might be more significant in honeybees experiencing real distance as opposed to the perceived distance that honeybees experienced through our tunnel manipulation setup. As a matter of fact, a larger set of genes was found to differ between foragers experiencing real long distance vs. short distance (Manfredini et al. in prep. ) but we cannot exclude that a portion of these genes might have changed their patterns of expression from one group to the other according for example to different metabolic costs of flight.
A recurrent problem in transcriptomic analyses is that the number of genes (predictors) is far greater than the number of samples. The fact that we managed to obtain an overlapping subset of genes by different ML and differential gene expression analyses in this setting indicates that the dance behaviour has a unique and well-defined transcriptomic signature. Moreover, our methods further prove that ML can be used as a complementary approach to provide further support to transcriptomic studies and help restricting the focus to a smaller group of genes that can be investigated for their association with a behaviour of interest. Note also that this approach can be expanded beyond honeybees and the waggle dance; a significant portion of animal behaviours are transient and plastic, making them very difficult to characterise, but they can be described by combining traditional gene expression analyses and ML approaches.
Interestingly the overlap between SVM and RFE-RF was greater than with GLMNET. In high-dimensional settings, such as ours, there can be more than one optimum for the learning objective, which is why ML models can end up in different final states on subsequent runs with random initialisation. In our case, it Is simply possible that GLMNET favoured reaching a different state than those reached by SVM and RFE-RF, hence selecting different features. It is also possible that since GLMNET has built in mechanisms to avoid overfitting, its learning procedure was slower than the other methods, and simply needed more training time.
Nevertheless, the extensive overlap between the SVM and RFE-RF approaches, and the fact that many of the identified genes were also in common with traditional methods to analyse gene expression shows great promise. We hypothesise that these genes are the best predictors for the dance behaviour, as they all appear to be expressed at higher levels in dancers vs. non-dancers. In particular, the two genes that are in common to all three ML approaches and were also identified by at least one approach of gene expression analysis deserve special attention.Boss (bride of sevenless ) belongs to the group of G-protein coupled receptors; an important family of genes often associated to expression of behaviour in insects. In particular,boss has been associated to a set of different functions inDrosophila , including sight and eye development, energy homeostasis and response to glucose . Boss might have been co-opted in honeybees to regulate dance behaviour, an energetically expensive activity that is highly related to feeding behaviour (and therefore to sugar response) and relies on visual input for orientation purpose during flight to a foraging site. As for heterogeneous nuclear ribonucleoprotein A1 , studies on the Drosophilaorthologue HRP59 have revealed a role for this gene in alternative splicing , a molecular process that consent the translation of a single mRNA molecule into multiple protein variants , significantly increasing the repertoire of responses to a stimulus. Even though the role of alternative splicing in the regulation of behaviour is largely unknown, this process has started to be characterized in multiple organisms, including honeybees , hinting at the possibility that the honeybee orthologue of HRP59 might contribute to the high plasticity that is necessary to regulate a complex behaviour such as the waggle dance.
More functional approaches are needed to move beyond correlation and investigate whether a causal link exists between the expression levels of the genes that we identified and the performance of dance behaviour. If further research were to support this, the results could then be used to test the recruitment potential in a specific colony. By designing a diagnostic tool to directly measure the levels of expression of the focal genes and compare them against a reference, it would be possible to assess the overall ability of a colony at recruiting to a foraging site through dancing.