Suggestions for more robust statistical analyses in sequencing studies
Data generated from amplicon sequencing is inherently compositional and
provides relative abundances, which are independent of the total
microbial load of the original sample. It has been previously shown that
analyzing compositional datasets with standard statistical techniques
(including Pearson correlations or t tests on proportions) can
lead to very high (up to 100%) false positive discovery rates
(56, 57). The
potential high false positive rates will undoubtedly lead any
data set to present some correlations with microbiome data, which is,
for the soil science, at an unprecedented rate given that microbiome
data presents thousands of different individual variables. The
possibility to obtain significant results, therefore, may also lead to
an “abuse” of the statistical significance (also referred to “p
hacking”). While exploratory analysis is useful, researchers should
always remember that an effect or association does not exist just
because it was statistically significant, and even more important is
that inference should be scientific and not merely statistical. In
recent years, the discussion around the abuse of p-values and their
importance has risen
(58–60), and some
alternative options have been proposed
(60), including
the use of more stringent p-values for claims of new discoveries
(61,
62). Clearly the issue is much more complicated than a simple critique
to the p-value, but involves scientific research at all levels,
including the publish or perish culture insinuated in academic fields,
and therefore we address the reader to further explore this topic
through the above-mentioned citations.
Nevertheless, the issue of generating false conclusions based on
spurious correlation exists, which include the variability inherent in
amplicon sequencing data. When adopting a “let’s sequence and see”
approach, many correlations (including false positive) will be
generated. Given that exploratory research often leads to follow-up
research, increasing our confidence will reduce the chances of research
born on unsubstantiated findings. Adopting a more stringent p-value
threshold will reduce the false positive rate, at the cost of type II
errors. In order to avoid this, if we wanted to adopt a more stringent
p-value while maintaining statistical power, it was shown that a 70%
increase in sample size has to be achieved. We understand that this is
often unrealistic, but we also recognize that this could save future
efforts born on unsubstantiated research. Instead, currect research often focus more often on
expanding the depth of analyses on the same few samples at the expense
of replication.
Considerations of soil intraplot variability or number of replicates used to analyze similarities/dissimilarities of microbial communities directly affects the ability to detect differences. To showcase how increasing sample size can increase statistical power in soil microbiome analyses, we calculated the dependency of permutational multivariate analysis of variance (PERMANOVA) statistical power to effect size with different number of replicates. Although the data set chosen \cite{Zheng_2019} captures a wide range of possible microbial communities, this is far from being representative over all possible soil environments. Therefore, we warn the reader to interpret the data shown only as an example. We used the R package micropower \cite{Kelly_2015} which allows to simulate distance matrices from a set of parameters to generate available PERMANOVA power or necessary sample size for a planned microbiome analysis. We used data from both the 16S rRNA gene and the ITS1 region filtered to include only bacteria and archaea (16S) and fungi (ITS). We calculated the Jaccard similarity index (Supplementary Fig. 1a,b) and used the average and standard deviation across all samples as parameters in the micropower package to simulate OTU/ASV tables with similar parameters. We also calculate the average statistical power ( ω2 ) for a range of effect sizes for the 16S data (Fig. 3b), defined as 'Low' (0.001-0.04), 'Medium' (0.04-0.08) and 'High' (0.08-0.12). Our analysis shows that, while for strong differences in microbial community the number of replicates does not affect the statistical power, by increasing the replicate number from 4 to 5 we were able to almost double the statistical power for small effect size ('Low') and achieve a power above 0.8 for medium effect sizes. These effects were even stronger when we doubled the number of replicates (4 to 8). Similar effects were obtained for the fungal data set (Supplementary Fig. 1c).