Comparison with Standard Gene Expression Analyses
We characterized gene expression patterns in the same groups of
honeybees as above with standard statistical approaches in order to
identified possible elements in common with the ML approaches that we
tested. The LRT approach identified 243 genes that were statistically
different between any two groups of bees, while the GLM approach
identified 373 genes that were specifically different between dancers
vs. non-dancers (see Supplemental Information for the lists of these
genes). We performed overlap analyses between these two gene sets and
the list of 18 genes selected by the machine learning approaches. This
resulted in 5 genes in common for the LRT approach, and 9 genes in
common for the GLM approach: both overlaps are statistically significant
(Representation Factors: 17.5 and 20.5; p-values < 0.001 in
both comparisons) indicating more genes in common than expected by
chance. Five genes were shared across all analyses that we performed
(see Table 2). Interestingly, all focal genes identified by ML
approaches (as well as elements in common with gene expression analyses)
were expressed at higher levels in dancers, indicating their possible
involvement with the regulation of dancing behaviour (see Figure 5).