Simulations
Running the intercept method on simulated data allowed us the assess the expected accuracy depending on the nuclear genome size and the expected mapping depth to the extra nuclear reference due to insert sequences. The model fit is summarised in Table 1 and Figure S1. We then used this model to predict whether a dataset is sufficient to generate an accurate intercept estimate. Figure 5 summarises the effects of genome size, vagrant DNA proportion, and sequencing depth on estimation accuracy. All the datasets we analysed (letters in Figure 5) are in a region of the parameter space where high accuracy is expected.
We also ran simulations with insertion rates varying along the extranuclear reference. It turned out that the intercept method was robust to these. Further information on how to deal with potentially unequal insertion rates are given in the tutorial in the Supplemental Material.