How Quantum Chemistry Fills the AI Gap

Published on: 

Quantum chemistry is accelerating drug discovery AI by providing physics-based, reproducible, and scalable data that encodes molecular reality.

There is a question ringing throughout the pharmaceutical industry: how far will AI take us? You can hear it in the $17 billion worth of AI partnership deals announced by pharma in the last year.1-10 You can hear it in the almost weekly LinkedIn announcement revealing the latest AI scientist company. You can hear it in the quiet skepticism of chemists and biologists who, despite spending valuable funding on multiple approaches, still can’t find a model that works for their target.

If you’ve read a review on AI in drug discovery, you’re probably familiar with its primary challenges. Lack of quality data. Lack of explainability. And lack of generalisation.11

Meanwhile, another in silico method has been advancing quietly but rapidly in mutual symbiosis with AI: quantum chemistry. For a century, it’s been theoretically possible to calculate the behaviour of all chemical systems; but computationally impossible to run those calculations ab initio at biologically relevant scales.The development of highly parallelisable algorithms and GPUs, accelerated for the purposes of AI, has helped change this. At the end of 2024, QDX’s research team demonstrated that quantum chemistry calculations can be run 3,568x faster than the previous state of the art, simulating the ab initio dynamics of a system with over 1400x more electrons than previously modelled at that level of accuracy.12

The biologists in the room should be pointing out that mutual symbiosis requires a benefit to go both ways. The AI wave has catalysed rapid development of GPUs into powerful, general-purpose parallel processors, directly impacting the number of quantum chemistry calculations that can be performed at once. Incidentally, quantum chemistry at scale is able to provide an ideal data engine for many drug-discovery-focused AI models. With recent advances, data generated using quantum chemistry is cheaper and more scalable than experimental data; it’s reproducible (no random noise between experiments); it’s not limited to what has previously been discovered; it encodes the physics of the system, rather than just teaching the model statistical correlations; and it’s been found to reduce the number of data points required to train an accurate AI model.

Say we wanted to train an AI model to predict binding affinity. High-quality experimental data is scarce, expensive to produce, and notoriously inconsistent. As demonstrated in Figure 1, Landrum et al found that combining IC50 or Ki values from different assays causes significant noise in the dataset.13 Or as one computational chemist said to me in regards to an AI model: ‘rubbish in, rubbish out.’

On the other hand, quantum chemistry enables physics-based calculations of binding energy with high levels of accuracy.15,19 These calculations are embarrassingly parallelisable and, with recent speedups in quantum chemistry software, significantly cheaper than experimental assays per compound. It’s far easier and more affordable to rent compute than to rent lab space.

Advertisement

Beyond economics and convenience, quantum chemistry can generate forms of data that are impossible to measure experimentally or compute classically. If you want to train a model to predict transition state geometries, quantum chemistry provides the only way to systematically generate the necessary 'ground truth' data. If you want to model charge transfer, you need quantum chemistry. If you want to model how a drug interacts with metal ions, you need quantum chemistry.

Quantum chemistry also changes the type and quality of inputs we can feed to AI models. Feature generation and selection can play key roles in improving a machine learning model’s predictive precision and generalization.16,20 What features of the molecule do we want to feed to the AI model? Which will be most informative for its pattern recognition? Currently, most models are trained on classical data, which doesn’t encode any actual description of the physics of the molecule. Efficient quantum data generation unlocks a new feature set for chemical AI; we move from training models on topological proxies (atoms and bonds) to training them on fundamental physical descriptors. Providing an AI with electron density maps, vibrational modes, and atomized energies allows it to learn the underlying 'energy landscape' of a molecule. This can provide an entirely different type of information.

Consider for a moment a more intuitive scenario. Say we were training an AI model to predict heart attacks:


1) Experimental data is comparable to feeding the model information about people who have had heart attacks. It may look at a million patients and learn that people who wear expensive watches and live in cold climates are likely to have heart attacks. These things aren’t actually the cause of heart attacks, but wearing expensive watches may correlate with having a stressful job, and living in a cold climate may correlate with getting less exercise. The model will work sometimes but miss anyone in a warm climate having a heart attack without an expensive watch.

2) Data computed using classical chemistry is based on a simplification of reality that changes depending on the forcefield used. In our analogy, we could compare it to feeding a model lots of textbook drawings of healthy and unhealthy hearts. It may learn how to predict which drawn hearts are unhealthy; but the style of drawing makes a difference, and the things it learns may not match what’s happening with real hearts. It’s ‘hit or miss’ how well it extends to the real world.

3) Now say we train the model on high-information descriptors that encode the actual physical causes of the heart attack: blood viscosity, arterial wall tension, and calcium density. Because these root causes are generalisable across heart attacks, it makes sense that you’d need far fewer examples for the model to get the idea. And once it learns how those features combine to cause heart attacks, it can be expected to accurately predict heart attacks in almost all hearts it may encounter.

Accordingly, quantum mechanical (QM) descriptors which encode the fundamental physics of chemical systems have been found to improve model performance in some scenarios, particularly when the dataset size is small.17 This means fewer data points are required for QM-augmented models to reach the same level of accuracy as classical models. As shown in Figure 2, they also improve generalisability; information on electronic structure is universally applicable across all of chemistry, while classical data is tethered to atom-types and empirical force-field parameters. These local shortcuts fail when the model encounters unfamiliar molecular architectures, whereas QM-trained models make accurate predictions even when encountering molecules with atoms that weren’t in their training set.18

Another example of QM providing unique data is in covalent binders. By running QM/MM dynamics on entire protein-ligand systems, we can model proposed reaction mechanisms and extract data that experiments cannot provide: transition state geometries and reaction barriers. The transition state (the highest-energy configuration that determines whether a reaction proceeds) cannot be directly observed experimentally, but QM calculations can characterize it. The barrier height tells us about reactivity: a warhead that's too reactive risks hitting off-target residues, while one that's not reactive enough would fail to label the protein efficiently. When multiple mechanisms are plausible, we can compare them and identify which is most energetically favorable. This mechanistic understanding helps medicinal chemists make informed design decisions rather than iterating through a large number of designs. Classical simulations cannot provide any of this; with bonds fixed, there is no reaction to model.

Training better models on new forms of data is valuable. But in drug discovery, knowing that a molecule bind isn't enough: medicinal chemists need to know why. This is where quantum chemistry offers something AI alone cannot.

The deeper advantage separating QM from purely statistical approaches is that calculating from first principles physics gives you an explanation along with your answer. This level of interpretability is important for medicinal chemists trying to refine their molecule based on computational results, and entirely absent from AI or other methods. Take the example of binding affinity again: we can measure binding affinity experimentally, or we can train an AI model to predict binding affinity. But neither of these methods tell scientists what’s driving that binding. You get an answer, but not much insight.

Instead, if we use a physics-based approach, such as ab initio QM to calculate relative binding free energy, we can extract a more holistic view of the components that impact binding affinity. Such a calculation also gives us information about how each residue of the protein is interacting with different components of the ligand and its quantitative impact on total binding energy, whereas AI models typically only give a single consolidated binding affinity prediction for each ligand that cannot be easily dissected. By providing these quantitative measures of design changes, medicinal chemists are better empowered to make highly informed decisions during lead optimization.


These examples highlight a unique strength of quantum chemistry: not just prediction, but explanation. When it comes to training AI models, this matters too - as previously discussed, QM-derived features encode physical reality in ways that improve generalization. Of course, it’s important to note that quantum chemistry isn’t the ideal data generation engine for all AI models. There are cases where data generated from the real world is necessary to encode statistical correlations; most obviously, there’s no way of calculating clinical outcomes or complex biological interactions from first principles alone.

But for properties that can be linked to the fundamental physics of a system, it seems almost obvious that feeding a model information about the physics is a more effective and generalizable training method than any approximation. For a long time, it’s been impossible to generate quantum chemistry data at enough scale to fully leverage this. With the progress that’s been made only in the last few years, we have reached an inflection point. For decades, quantum chemistry has been the domain of specialists running week-long calculations on single molecules. That era is ending. As QM data generation becomes fast and cheap enough to feed AI at scale, we're not just improving existing models; we're unlocking an entirely new paradigm for computational drug discovery. The companies that recognise this shift early will have a significant head start.

The question ringing through pharma—how far will AI take us?—may be missing the point. AI alone will take us as far as our data allows; no further. The more interesting question is: how far can AI take us when we finally give it data that encodes reality?

None of this is to say quantum chemistry will replace experimental data, or that AI trained on QM descriptors will solve drug discovery overnight. But for the specific, critical problem of training models that generalise beyond their training set, that understand why molecules behave the way they do: physics-based data generation may be the missing piece. The infrastructure to generate it at scale finally exists. Now it's a matter of using it.

References

  1. NVIDIA; Eli Lilly and Company. NVIDIA and Lilly Announce Co-Innovation AI Lab to Reinvent Drug Discovery In the Age of AI. Indianapolis: Eli Lilly and Company; 2026 Jan 12. https://investor.lilly.com/news-releases/news-release-details/nvidia-and-lilly-announce-co-innovation-ai-lab-reinvent-drug
  2. Insilico Medicine. Insilico Medicine announces US$888 million multi-year collaboration with Servier for drug discovery and development in oncology. Cambridge (MA): Insilico Medicine; c2026. https://insilico.com/news/u4cbsok2s1-insilico-medicine-announces-us888-millio
  3. AstraZeneca. AstraZeneca enters into collaboration with CSPC. Cambridge (UK): AstraZeneca; c2025. https://www.astrazeneca.com/media-centre/press-releases/2025/astrazeneca-enters-into-collaboration-with-cspc.html.
  4. Superluminal Medicines. Superluminal Medicines announces collaboration with Eli Lilly and Company to advance small molecule therapeutics for cardiometabolic diseases and obesity. Boston: PR Newswire; 2025 Aug 14. https://www.prnewswire.com/news-releases/superluminal-medicines-announces-collaboration-with-eli-lilly-and-company-to-advance-small-molecule-therapeutics-for-cardiometabolic-diseases-and-obesity-302529689.html
  5. Orionis Biosciences. Orionis Biosciences announces strategic partnership with Genentech. Boston (MA): Orionis Biosciences; 2025 May 21. https://orionisbio.com/2025/05/genentech-collaboration-2025/.
  6. Creyon Bio, Inc. Creyon Bio and Lilly enter into RNA-targeted oligo therapy development collaboration. San Diego (CA): Creyon Bio; 2025 Apr 29. https://creyon.com/news/creyon-bio-and-lilly-enter-into-rna-targeted-oligo-therapy-development-collaboration/.
  7. Nabla Bio, Inc. Nabla Bio signs second Takeda collaboration to advance AI-driven design of protein therapeutics [Internet]. Cambridge (MA): Business Wire; 2025 Oct 14. https://www.businesswire.com/news/home/20251014934240/en/Nabla-Bio-Signs-Second-Takeda-Collaboration-to-Advance-AI-Driven-Design-of-Protein-Therapeutics.
  8. Relation Therapeutics. Relation announces strategic collaboration with Novartis to advance therapeutics for atopic diseases. London: GlobeNewswire; 2025 Dec 9. https://www.globenewswire.com/news-release/2025/12/09/3202076/0/en/Relation-announces-strategic-collaboration-with-Novartis-to-advance-therapeutics-for-atopic-diseases.html.
  9. Halda Therapeutics. VantAI and Halda Therapeutics forge alliance to discover next-generation RIPTAC medicines. New Haven (CT): Halda Therapeutics; 2025 Aug 19. https://haldatx.com/vantai-and-halda-therapeutics-forge-alliance-to-discover-next-generation-riptac-medicines/.
  10. Earendil Labs. Earendil Labs announces worldwide exclusive license agreement with Sanofi for next-generation bispecific antibodies for autoimmune and inflammatory bowel diseases. Middletown (DE): PR Newswire; 2025 Apr 17. https://www.prnewswire.com/news-releases/earendil-labs-announces-worldwide-exclusive-license-agreement-with-sanofi-for-next-generation-bispecific-antibodies-for-autoimmune-and-inflammatory-bowel-diseases-302431020.html.
  11. Bhat, A. R., & Ahmed, S. (2025). Artificial intelligence (AI) in drug design and discovery: A comprehensive review. In Silico Research in Biomedicine, 1, 100049.https://doi.org/10.1016/j.insi.2025.100049
  12. Stocks, R., Galvez Vallejo, J. L., Yu, F. C. Y., Snowdon, C., Palethorpe, E., Kurzak, J., Bykov, D., & Barca, G. M. J. (2024). Breaking the million-electron and 1 EFLOP/s barriers: Biomolecular-scale ab initio molecular dynamics using MP2 potentials. SC24: International Conference for High Performance Computing, Networking, Storage and Analysis.https://doi.org/10.1109/SC41406.2024.00015
  13. Landrum, G. A., & Riniker, S. (2024). Combining IC$_{50}$ or $K_i$ values from different sources is a source of significant noise. Journal of Chemical Information and Modeling, 64(5), 1560–1567. https://doi.org/10.1021/acs.jcim.4c00049
  14. Toniato, A., Unsleber, J. P., Vaucher, A. C., Weymuth, T., Probst, D., Laino, T., & Reiher, M. (2023). Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning. Digital Discovery, 2(3), 663–673. https://doi.org/10.1039/D3DD00006K
  15. Thapa, B., Beckett, D., Erickson, J., & Raghavachari, K. (2018). Theoretical study of protein–ligand interactions using the molecules-in-molecules fragmentation-based method. Journal of Chemical Theory and Computation, 14(10), 5143–5155.https://doi.org/10.1021/acs.jctc.8b00531
  16. Theng, D., & Bhoyar, K. K. (2024). Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems, 66(3), 1575–1637.https://doi.org/10.1007/s10115-023-02010-5
  17. Li, S.-C., Wu, H., Menon, A., Spiekermann, K. A., Li, Y.-P., & Green, W. H. (2024). When do quantum mechanical descriptors help graph neural networks to predict chemical properties? Journal of the American Chemical Society, 146(33), 23103–23120. https://doi.org/10.1021/jacs.4c04670
  18. Shen, Z., Yang, Y., Sparrow, Z. M., Ernst, B. G., Quady, T. K., Kang, R., Lee, J., Yang, Y., Tu, L., & DiStasio, R. A., Jr. (2025). Learning molecular conformational energies using semilocal density fingerprints. The Journal of Physical Chemistry Letters, 16(51), 13083–13092. https://doi.org/10.1021/acs.jpclett.5c02222
  19. Pecina, A., Fanfrlík, J., Lepšík, M., Řezáč, J., Hobza, P., & Bronowska, A. K. (2024). SQM2.20: Semiempirical quantum-mechanical scoring function yields DFT-quality protein–ligand binding affinity predictions in minutes. Nature Communications, 15(1), 1127.https://doi.org/10.1038/s41467-024-45431-8
  20. Markovitch, S., & Rosenstein, D. (2002). Feature generation using general constructor functions. Machine Learning, 49(1), 59–98.https://doi.org/10.1023/A:1014046307775