Data analysis
All tests were done using R (version 3.6.1) and RStudio (1.2.1335). Graphs were created with the ggplot2 package in all cases (Wickham, 2016). As the data was not normally distributed, generalized linear mixed models (GLMM) were used with the appropriate error structure for analyzing the fecundity data and sperm viability. In the first case, as the data suffered from an excess of zero counts, we used zero-inflation models as advised by Zuur and colleagues (Zuur et al., 2009). We modelled both the likelihood of sterile replicates as well as the number of offspring in one model, combining both a binomial and a count part (with a negative binomial distribution) in one model. This needed packages pscl (Jackman, 2017), lmtest (Zeileis and Hothorn, 2002) and glmmTMB (Brooks et al., 2017). The model contained the following factors: temperature, day and day2, as well as the interactions between temperature and day and temperature-day2. For the sperm viability analysis, we accounted for pseudoreplication as the sperm from the same male was measured in three time points. In this case, time point, and temperature were included in the model as fixed factors.
Fertility in the fecundity assay, male organ size (wing length, AG and SV size), the egg-to-adult survival as well as the behavior in the sperm competition experiment were analyzed with generalized linear models (glm) with the appropriate error structure and correction for overdispersion using the quasi-extension if necessary. Significance of factors was tested through an analysis of deviance by subtracting a factor from the full model and tested with an F - or Chi-square distribution as appropriate for the error structure (Crawley, 2007). We present models with only the retained significant factors. Most of the statistical analysis were done in two different ways: in the first case, all five treatments (developmental temperature and opportunity to recover or not) were considered separately by coding them as five different treatments. In the second approach we instead included larval temperature and recovery as different factors, but this precluded us from using data from control males, allowing comparisons only among heat-challenged males. As control males both remained at their developmental temperature and were ‘allowed’ to recover it was not possible to assign them to either level for the factor recovery and thus precluded us from coding this as two independent treatment factors with a full-factorial design. We report always the first approach unless the contrary is specified.
A Chi-square test was applied to analyze sperm presence in the SVs and the mating and remating rates in the sperm competition experiment (Dytham, 2011). Allometry between AG size and wing length was tested by using a regression. For that, both variables were converted into the same units (µm2) and the data was transformed to a log scale for the analysis (Shingleton et al., 2007). Day was included as a fixed factor in the model to account for the ontogenetic allometry.
Package multcomp (Hothorn et al., 2008) was used for the post-hoc comparison of wing length. Pairwise comparisons using t tests were used for analyzing differences between temperature treatments in the AG size.