Appendix B: Power analyses and Monte Carlo simulations
To support our claim that samples sizes are sufficient, for detecting differences between the groups of interest (i.e., SSI radiation, SSI generalist-only, and Caribbean), we include the results of power analyses, which calculate the required minimum sample size across the range of effect sizes (eta2: 0.0002 - 0.12), number of individuals per population (n=7-21), and correlations of trait estimates for individuals observed in our dataset (corr: 0.00046- 0.50). From these iterative power analyses, we were able to determine for which combination of our parameters we would have sufficient sample sizes to detect differences between groups with 80% power. We compared these cut offs to the observed values for each trait and population (Figure B1). Ultimately, we found that we have enough samples to detect differences between groups for all traits except for ascending process length, orbit diameter, and cranial height.
We also performed Monte Carlo simulations to investigate at which sample sizes estimates of variation stabilize based on observed means and standard deviations of traits from our dataset. To perform these simulations, we first calculated the average trait values (range: -0.71 - 0.42) and standard deviations (range: 0.01-0.8) for each trait across the potential groups of our data set (19 groups representing unique species per population, plus the additional three parameter estimates associated with Caribbean, SSI generalist-only, and SSI radiating groups). We used these observed means and standard deviations to generate normal distributions to represent hypothetical populations to sample from. We resampled from these distributions using sample sizes from 1-100 for 1000 iterations, recalculating the mean each time. For each of our 22 groups we ended up with 1800000 independent estimates of means across the sample size range of 1-100. At each sample size we then calculated the standard deviation as a metric representing the amount of variation introduced due to sample size. As expected, as sample size increased standard deviation decreased, however, the amount of variation at low sample sizes was dependent on the original parameters of the dataset (Figure B2 A&B).
To determine at which sample sizes we observed a stabilization of variation (i.e., when adding more samples did not significantly affect the estimates of SD) we grouped sample sizes into sets of 5 (e.g., 1-5, 6-10, 11-15, etc.) and calculated the derivatives for each of these groups across the sampling range. We used a one-sample t-test to determine at which sampling range the derivative no longer significantly differed from zero using a conservative cut off of non-significance withp =>0.1. Here, a derivative of zero represents when variation in SD is no longer significantly affected by the addition of more samples (i.e., when variation is stable). We determined at which sample size we first observed non-deviance from zero for each of the observed parameters from our dataset and visualized these results using a histogram (Figure B2.C) Overall we found that the median sample size where stabilization occurred was at 20 individuals (range: 5-35 individuals; Figure B2.C).