Simulations
Running the intercept method on simulated data allowed us the assess the
expected accuracy depending on the nuclear genome size and the expected
mapping depth to the extra nuclear reference due to insert sequences.
The model fit is summarised in Table 1 and Figure S1. We then used this
model to predict whether a dataset is sufficient to generate an accurate
intercept estimate. Figure 5 summarises the effects of genome size,
vagrant DNA proportion, and sequencing depth on estimation accuracy. All
the datasets we analysed (letters in Figure 5) are in a region of the
parameter space where high accuracy is expected.
We also ran simulations with insertion rates varying along the
extranuclear reference. It turned out that the intercept method was
robust to these. Further information on how to deal with potentially
unequal insertion rates are given in the tutorial in the Supplemental
Material.