2. Technical considerations in a heterogeneous and diverse habitat 

The diversity of microorganisms in soil has been well-documented as a major challenge in studying soil microbial communities \citep{Gans_2005,Fierer2006}. A single gram of soil is estimated to contain 108-109 cells \cite{Bloem1995,Nunan_2001} and tens of thousands of microbial taxa \citep{Roesch2007}. Additionally, compared to host-associated microbiomes (e.g., gut, skin, or plant root microbiome), free-living bacteria exhibit higher levels of diversity. In a recent comparison of alpha-, beta- and gamma-diversity from samples collected as part of the Earth Microbiome Project (EMP), soils were determined to have the highest alpha-diversity across all environments \citep{Walters_2020}. In terms of beta- and gamma-diversity, soil came in second only to sediment samples. Fewer studies have investigated the diversity and global distribution of fungi \cite{Tedersoo_2014,V_trovsk__2019}. These studies indicate that more heterogeneous environments, such as soils and sediments, may contain more diverse fungal communities that more homogeneous habitats (e.g., marine, freshwater, air, biofilms) \citep{Fierer_2011,Walters_2020,Torsvik_2002}
In addition to high biological diversity, researchers interested in the microbial composition of soils are confronted with technical challenges throughout the sample processing workflow. The general workflow of amplicon sequencing includes: 1) planning and implementation of the experimental design, 2) nucleic acid extraction (influcing quality control) 3) primer choice, PCR amplification, sequencing, 4) processing and analysis of sequence data, and 5) data interpretation (Fig. 2). At each of these steps, a subset of the sample is selected and information can be lost as a result of the techniques applied (i.e., nucleic acid extraction method, primer selection, statistical approaches), with consequences for data interpretation in the context of ecological questions \cite{Morton_2019,McLaren_2019}. As with any scientific experiment, the specific hypotheses to be addressed should determine the experimental design. Besides this, in experiments involving amplicon sequencing, one must consider the appropriate spatial scale (i.e., aggregate/microscale, centimetre scale, meter scale) and the frequency of sampling in order to address specific questions regarding community dynamics. While the sample that is sequenced represents the specific moment in time when it was frozen or extracted, the presence of exogenous or relic DNA in soil samples has the potential to influence community composition and downstream data interpretation (\citealt{Lennon2018,Carini2016}; discussed in section 5). Additionally, sample replication remains a critical concern in soil studies, particularly when it comes to statistical inference and/or construction of co-occurrence networks (discussed in sections 5 and 6).
The physicochemical properties of soils make nucleic acid extraction from this matrix particularly challenging. Numerous extraction protocols and kits have been developed to circumvent challenges with DNA extraction from soil, however, each method introduces distinct bias on the subset of the microbial community retrieved \citep{Terrat_2011,Zieli_ska_2017,Dopheide_2018}. The presence of inhibitors, such as humic substances, is common in soil and can reduce the quality and purity of nucleic acids in the extracted samples and decrease the efficiency of reverse transcription and/or PCR reactions \citep{Schrader2012}.  In addition to the nucleic acid extraction method of choice (chemical or physical lysis, DNA and/or RNA extraction), primer selection dictates the organisms or functions targeted by the approach (phylogenetic or functional marker; see Table S1). Finally, due to the diversity and heterogeneity of soil samples the resulting data is often sparse, containing numerous taxa with low abundance and prevalence which may be dealt with through filtering thresholds or statistical approaches (see section 3). The loss of information at each step of the process - from sampling to analysis - must be carefully considered in light of amplicon sequencing data interpretation. Keeping all these factors in mind, the application of sequencing technologies to soil has provided invaluable information regarding the structure and critical nature of understanding microbial communities.