Comparison with Standard Gene Expression Analyses
We characterized gene expression patterns in the same groups of honeybees as above with standard statistical approaches in order to identified possible elements in common with the ML approaches that we tested. The LRT approach identified 243 genes that were statistically different between any two groups of bees, while the GLM approach identified 373 genes that were specifically different between dancers vs. non-dancers (see Supplemental Information for the lists of these genes). We performed overlap analyses between these two gene sets and the list of 18 genes selected by the machine learning approaches. This resulted in 5 genes in common for the LRT approach, and 9 genes in common for the GLM approach: both overlaps are statistically significant (Representation Factors: 17.5 and 20.5; p-values < 0.001 in both comparisons) indicating more genes in common than expected by chance. Five genes were shared across all analyses that we performed (see Table 2). Interestingly, all focal genes identified by ML approaches (as well as elements in common with gene expression analyses) were expressed at higher levels in dancers, indicating their possible involvement with the regulation of dancing behaviour (see Figure 5).