2. ANI timings are simply wrong. Therefore the TOC and Figure 4 are misleading. ANI timing is at least 100 times faster. The author's script is re-loading all python dependencies and compiles the neural network model for every conformer. This takes 2.45 out of 2.5 seconds of the run. Even with sequential energy evaluation on a CPU, it should be around 0.05s for the 2x model and probably ~0.025s for 1x/ccx . … Therefore the recommended use is to load all conformers and evaluate them at once.
We have discussed this point with the ANI developers repeatedly. Many of the quantum codes also require time to load code libraries, create atomic orbital basis sets, etc. Programs like ORCA have a wide variety of options that change performance (e.g., DFT grid size, use of RI approximations, etc.) Furthermore, many force fields perform atom typing once per molecule and also have a fixed-time overhead.
Our timings for all methods do not involve loading python dependencies or launching a program. Traditional quantum chemical programs report time for the calculation, which is what we have used in our scripts for ML methods (i.e., creating a neural network model is similar to creating an initial guess for the self-consistent field iterations). We believe we provide a fair comparison for single-point energy evaluations across all our methods. If the ANI authors feel a different comparison is worthwhile, they are welcome to publish a rebuttal or alternative benchmark - our data and scripts are all online.
In the revised manuscript, we have added a section on "batch evaluation" for force field and ANI methods. We note this does not change the accuracy of the methods, nor our resulting conclusions. It only somewhat "bends the curve" since the force field and ANI methods improve in speed – but since the x-axis of our figure is logarithmic, and force field methods are still 200-300x faster, qualitative descriptions are similar.