The effect size measure
The calculation of the small‐sample bias corrected log response ratio and Hedges’ d both rely on the SD values of the control and treatment group. Imputing missing SDs thus affects both, effect sizes and effect size weights. For the simple log response ratio and Fisher’sz the imputation of missing SDs and/or SSs only affects effect size weights.
The type of missing data :
Our simulations show that missing SSs could/should routinely be imputed, albeit with caution in case a correlation between effect sizes and sample sizes in the Fisher’s z dataset. Some studies might not report their actual SSs but rather give some indication on the lower or upper boundary (e.g. if an unknown number of samples were excluded from the presented analyses). Such information can be used to curtail the range of imputed values, as can be done within the following imputation methods: Linear regression, predictive mean matching, classification and regression trees, random forest, Bayes predictive mean matching and bootstrap expectation maximization.
For the log response ratio and Hedges’ d the treatment of missing SDs will have a stronger effect on the grand mean and its confidence interval than the treatment of missing SSs. What we did not investigate with our simulations is the effect of the range and distribution of SDs and/or SSs. Larger ranges and non-uniform distributions of SDs and/or SSs might likely result in higher variability of imputed values and thus larger confidence intervals. Meta-analyses that summarize findings from different study designs; e.g. across observational and experimental studies or across different organism groups; could harbour exceeding and uneven distributions of SDs and/or SSs that we did not simulate in for this study.
The mechanism leading to the observed pattern of missingness :
Following our simulation results, data that is missing completely at random (MCAR) or missing at random (MAR) could/should routinely be imputed. For Hedges’ d , data that is not missing at random (MNAR) introduced deviation in the grand mean (in comparison to a fully informed weighted meta-analysis), regardless of the option to treat such missing data. Imputation via bootstrap expectation maximization might yield a weaker deviation in grand means, but the applied algorithm frequently failed if more than 60% of SDs and/or SSs were missing. Manually fine-tuning of the respective algorithm parameters might increase its succession rate.
Relationships between effect sizes and SDs :
Imputation methods that applied a predictive model, i.e. except of mean, median and random sample value imputations, could account for a relationship between effect sizes and effect sizes precision. In case of such a relationship, those algorithms that used predictive mean matching tended to yield grand means that were most similar to the results from fully informed weighted analyses. In case of correlated effect sizes and SSs in the Fisher’s z dataset, the imputation of missing data via mean, median, random sample and non-parametric random forest imputation introduced a stronger deviation of the grand mean than the omission of those incompletely reported studies.