Predictive Modeling for Small-Molecule Formulation Development Using Advanced Algorithms

Challener,Cynthia;

Predictive Modeling for Small-Molecule Formulation Development Using Advanced Algorithms

Published on: October 5, 2025

Cynthia A. Challener

Pharmaceutical Technology, Pharmaceutical Technology October 2025, Volume 49, Issue 8

Pages: 14–17

Artificial intelligence and machine learning are anticipated to boost success rates.

Advanced drug discovery tools are leading the identification of increasingly complex small molecules with physicochemical properties that lend promising biologic and pharmacologic activity while simultaneously creating tremendous challenges in developing formulations capable of delivering the drug to its intended site of action. Many different molecular properties acting alone and interdependently with each other and different formulation excipients and buffers influence the pharmacokinetics and pharmacodynamics of APIs. Using traditional empirical methods for investigating and identifying potential formulations for these complex molecules requires excessive time and resources at a time when reducing cost and accelerating development are essential to success. Formulation scientists are turning to predictive modeling approaches and platform formulation processes that leverage artificial intelligence (AI) and machine learning (ML) to more rapidly and comprehensively evaluate complicated formulation spaces including complex molecules and advanced drug delivery technologies (1–4).

Moving from academia to industrial application

The use of AI/ML for industrial pharmaceutical formulation development is at the nascent stage. “Much of the progress so far has come from academic research, which often seeds future industry innovation, as well as from emerging startups and the steady improvement of open-source models,” says Pauric Bannigan, chief science officer and cofounder of Intrepid Labs.

“Interest is growing rapidly, fueled by increasing recognition of AI/ML’s potential to navigate high-dimensional formulation design spaces, optimize excipient selection, and predict stability or release performance before extensive lab work,” Bannigan observes. Given these developments and the rapid advances in lab automation currently being made, he believes it is only a matter time before AI and ML become widely applied to formulation development.

How are growing formulation challenges overcome?

Small-molecule drug developers face several challenges, and many of them are well-suited to being addressed using AI/ML approaches, according to Bannigan.

Top of the list, says Daniel Joseph Price, head of the Excipients Business and working within Process Solutions in the Life Science business of Merck KGaA, Darmstadt, Germany, is the poor solubility of drug substances, with approximately 70–90% of drugs currently in the pipeline suffering from this problem (5). “This challenge is pronounced for newer drug candidates that often surpass Lipinski’s Rule of Five (6), which sets foundational criteria for oral bioavailability. In this context, overcoming solubility issues to ensure effective oral drug delivery and hence enhancing patient acceptance remains paramount,” Price emphasizes.

Other small-molecule formulation challenges highlighted by Bannigan include achieving precise release kinetics for controlled- and sustained-release systems, identifying robustness early for scale-up and manufacturability, and minimizing API waste during screening. “Each of these challenges is a multidimensional problem where dozens of formulation components and process parameters interact in nonlinear ways,” he says.

Formulations that address these key issues often involve multiple excipients. Different combinations of excipients must thus be investigated along with different formulation concentrations and processing parameters (e.g., mixing speed and time, temperature control, compression force, drying method and rate, etc.) to identify the optimum formulation design. Even with tight constraints, the exploration space can be enormous, according to Bannigan.

Pharmaceutical formulation, Bannigan adds, is a high-dimensional, sparse-data challenge where dozens of excipients and process parameters interact in nonlinear ways. As an example, he notes that with three excipients, five possible concentrations, and five process parameters at three settings each, the total number of possible unique formulations would be 3,645,000. And that would be a conservative case. “In reality, formulations often use more excipients in varying ratios with additional process parameters, pushing the possibilities into the tens or hundreds of millions,” he says. Exploring them all is simply not possible using conventional approaches, including predictive modeling systems leveraging conventional statistical methods.

The promise of AI/ML

Without AI/ML, in fact, Bannigan observes that countless breakthrough formulations, and more importantly the medicines they could enable, will remain buried in the noise, invisible to traditional trial-and-error approaches. “AI/ML can cut through this complexity by modeling these interactions, learning from each experimental result, and steering exploration toward the most informative and promising regions of the design space,” he contends.

Specifically, Bannigan notes, AI/ML can map complex, multi-objective relationships between components and performance; identify promising solutions that human intuition might overlook; and adapt in real time as new experimental data arrives through active learning. “By efficiently navigating vast combinatorial spaces, AI/ML not only accelerates the discovery of robust, scalable formulations, it reduces API waste, increases the probability of hitting target product profiles, and brings hidden, high-potential medicines into view,” he observes.

The application of AI/ML in formulation development offers a transformative approach to current formulation challenges, agrees Price. “By using AI/ML, scientists can streamline the formulation process, effectively narrowing down the design space. This allows them to concentrate on formulations and excipients that are likely to have a significant impact, rather than getting lost in exhaustive trial-and-error testing.”

“Furthermore, AI/ML enables researchers to explore a wider array of options, facilitating innovative solutions without the extensive time investment typically required for traditional formulation trials,” Price states. For instance, he anticipates companies harnessing AI/ML technologies to investigate broader ranges of chemicals and solutions by trying out things that go beyond the obvious, fostering greater innovation in drug formulation. “This strategic shift could ultimately lead to more effective therapies and improved patient outcomes,” he says.

Improving predictive formulation modeling

Formulation development has traditionally relied on mechanistic or simplified models to guide decision-making and optimize outcomes, according to Price. Examples include the use of the Arrhenius equation to predict the stability of drug substances under accelerated conditions, or Lipinski’s Rule of Five, which provides heuristic understanding for assessing the drug-likeness and oral bioavailability of small molecules. These models, which are rooted in physical chemistry and empirical rules, have delivered significant value—often requiring limited data to produce accurate and interpretable results for well-understood systems, Price notes.

The greater complexity of current drug candidates combined with the increased volume and variety of available formulation data is revealing the limitations of mechanistic models to solve some formulation challenges, Price says. Many new drug molecules fall outside the “rule-of-five” chemical space and exhibit behaviors that are not easily captured by conventional equations.

“In this evolving landscape, AI/ML models offer distinct advantages. ML algorithms can process and learn from large, multidimensional datasets, capturing subtle, nonlinear relationships between formulation variables and outcomes such as solubility, stability, and bioavailability,” Price comments. Unlike mechanistic models, he adds, ML approaches are not constrained by predefined assumptions and can uncover patterns that may be invisible to human experts, enabling formulation scientists to explore a far broader design space, accelerate optimization cycles, and address formulation challenges that would otherwise be intractable using traditional methods. Thus, while mechanistic models remain foundational tools in formulation science, Price believes AI/ML models provide powerful complementary capabilities, especially for novel drug development scenarios.

It is also advantageous, according to Bannigan, that AI/ML algorithms can be slotted directly into existing predictive modeling approaches by turning what is usually a static experimental plan into a living, adaptive process. “Instead of running a fixed set of tests designed at the start using design-of-experiment approaches, AI/ML begins with a small number of experiments, learns from the results, and instantly updates its recommendations for the next round. This cycle repeats, zeroing in on the most promising parts of the formulation space while skipping unproductive paths,” Bannigan explains. The advantages over traditional approaches include speed, efficiency, and reach. “Predictive models leveraging AI/ML enable identification of high-potential formulations faster using less API, and often find solutions that conventional methods never get close to,” he concludes.

Enhancing platform formulation strategies

AI/ML can also be used to improve platform processes for formulation development. One example, says Bannigan, is the use of these technologies as decision-making engines within self-driving labs and automated formulation platforms. “In this case, the AI/ML system analyzes incoming experimental data in real time, recommends the next most informative experiments, and identifies multiple viable formulation paths rather than prematurely converging on a single lead,” he explains. The system also standardizes workflows so that every project benefits from cumulative platform learning.

When integrated into automated laboratory platforms, AI/ML technologies can, adds Price, not only help reduce the manual effort required and variability associated with human operation, but enable the generation of large volumes of high-quality, consistent experimental data that can be leveraged to train and refine machine learning models, enabling the discovery of complex patterns and relationships that might otherwise go unnoticed.

Price also notes that the integration of ‘active learning’ strategies such as Bayesian Optimization, which works in tandem with ML models to iteratively select the most informative experiments that will most efficiently reduce model uncertainty and accelerate the search for optimal formulations, enhances this process.

“This synergy between automation, AI/ML, and active learning allows researchers to maximize the impact of each experiment, accelerate formulation optimization, and tackle challenging formulation development problems,” Price says. He adds that “these advances enable drug developers to innovate faster, reduce costs, and make more confident, evidence-based decisions throughout the formulation lifecycle.”

“The tightly integrated, data-rich, closed-loop cycle created by combining automation, AI/ML, and active learning not only accelerates the design–build–learn process and boosts experimental efficiency, it also reduces API consumption and increases the likelihood of finding scalable, high-performance formulations,” Bannigan contends. This intelligent, iterative process, he believes, will help companies identify novel formulation approaches and, ultimately, new life-altering medicines that would otherwise go undiscovered.

Increasing the likelihood of candidate success

Accelerated formulation development and identifying new formulation strategies are not the only advantages of leveraging AI/ML technologies when exploring formulation spaces. By rapidly focusing development on formulations with the highest predicted likelihood of meeting target product profiles, AI/ML also cuts down both early attrition and late-stage failures from chemistry, manufacturing, and controls-related issues, according to Bannigan.

“This targeted approach improves in vitro–in vivo correlation from the start, ensures scalable formulations are identified sooner, and conserves API so teams can explore backup candidates in parallel. The result is a higher probability that promising small-molecule drugs will make it through development and into the hands of patients, essentially turning potential therapies into real, market-ready medicines faster and with less risk,” Bannigan contends.

One industrially relevant example is a predictive model developed by Merck KGaA, Darmstadt, Germany for accurately and reliably predicting cocrystal coformers for a range of APIs without the need for any prior experiments. Optimum cocrystals, according to Price, often exhibit significantly higher solubilities than the original API, but identifying them can be challenging. The AI-based model was trained using a carefully curated, large dataset of test drugs and coformers covering a large chemical space and supplemented with targeted experimental data and has been shown to dramatically improve the likelihood of finding the optimum coformer (by a factor of three compared to the trial-and-error approach) (2).

While hindrances to adoption of AI/ML in formulation development do exist, such as fragmented, inconsistent, and proprietary historical data; cultural and regulatory caution that demands interpretability; and the lack of integrated automation infrastructure to operationalize models effectively, Bannigan anticipates AI/ML becoming a standard part of formulation workflows within the next one to three years. He sees such systems becoming integrated with high-throughput platforms, driving faster design–build–learn cycles, and using active learning to adapt in real time with model design and experiment selection guided by large language models (LLMs) capable of rapidly synthesizing literature, historical data, and domain expertise.

In the longer term (3–10 years), Bannigan predicts foundation models, federated learning, and AI-driven excipient design will transform formulation development from trial-and-error to precision engineering, with LLMs evolving into specialized scientific copilots. “In this future, more therapies will advance successfully from development to the patients who need them,” he concludes.

References

Huanbutta, K.; et al. Artificial Intelligence-Driven Pharmaceutical Industry: A Paradigm Shift in Drug Discovery, Formulation Development, Manufacturing, Quality Control, and Post-Market Surveillance. European Journal of Pharmaceutical Sciences, 2024 203, 106938. DOI: 10.1016/j.ejps.2024.106938
Merck KgaA, Harnessing AI To Speed Up Drug Formulation.EMD Group Science Space Blog, emdgroup.com.
Bannigan, P.; et al. Machine Learning Directed Drug Formulation Development. Advanced Drug Delivery Reviews, 2021 175, 113806. DOI: 10.1016/j.addr.2021.05.016
Konagurthu, S. Revolutionizing Drug Development: AI-driven Solutions for Poor solubility and Bioavailability. Patheon Blog, Patheon.com, April 15, 2024.
Ting, J.M.; et al., Advances in Polymer Design for Enhancing Oral Drug Solubility and Delivery. Bioconjugate Chemistry, 2018 29 (4), 939-952. DOI: 10.1021/acs.bioconjchem.7b00646
Halford, B. Wrestling with Lipinski’s Rule of 5. C & E News, 2023 101 (8). https://cen.acs.org/pharmaceuticals/drug-discovery/Wrestling-Lipinski-rule-5/101/i8

About the author

Cynthia A. Challener, PhD, is a contributing editor to Pharmaceutical Technology®.

Article details

Pharmaceutical Technology®
Vol. 49, No. 8
October 2025
Pages: 14-17

Citation

When referring to this article, Challener, C. A. Predictive Modeling for Small-Molecule Formulation Development Using Advanced Algorithms. Pharmaceutical Technology 2025 49 (8).

Download Issue: Pharmaceutical Technology October 2025