Embracing the Power of Algorithms
In the past decade, collaborative research among chemists, cheminformatics experts, chemical engineers, and data scientists has yielded various algorithms capable of predicting complete synthesis routes. These advancements have been integrated into a range of software packages, both commercial and open-source, such as IBM Rxn for Chemistry, ASKCOS, Chemairs, Synthia, Reaxis, and SciFinderN. These tools are now readily accessible to the entire chemistry community. Although primarily developed to assist in drug development, these predictive tools can also enhance commercial route selection by increasing the diversity of ideas.
One notable application of these algorithms was in the synthesis of Lotiglipron at Pfizer. Using ASKCOS, a retrosynthesis tool developed by the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) consortium, researchers manually incorporated promising routes generated by the software. ASKCOS allows users to input a target molecule and generates potential synthetic routes using machine learning models trained on various chemical databases. Despite the sophistication of these tools, they often struggle with complex transformations, such as those involving heterocycle formation or the creation of desired chiral species, common in pharmaceutical applications. As a result, significant manual effort is still required to filter out impractical suggestions.
Figure 1: Diversity of routes generated for a single target molecule
In the case of Lotiglipron, researchers proposed a method of parallel processing where synthetic ideas were gathered and filtered separately before merging them with human contributions at the end of a brainstorming cycle. This approach aimed to mitigate the noise generated by predictive algorithms, which can overwhelm human creativity and decision-making by masking areas of interest.
The primary challenge in refining these predictive tools lies in the quality of available training data. Current datasets, often derived from electronic lab notebooks, public patents, and literature sources, are biased towards successful reactions. Researchers suggest that utilizing curated data from synthesis networks could significantly improve the algorithms’ performance.
An example of the application of these tools is illustrated by the Lotiglipron synthesis network. The network integrates both human and algorithm-generated ideas, presented in a manner that facilitates the identification of individual synthesis routes. The six synthesis routes proposed by humans and six by the software were further enriched with annotations and data concerning specific route properties, allowing for optimization of the route selection process. Queries could then be made to find the shortest synthesis route from target molecules to starting materials, using Dijkstra’s algorithm, which minimises the number of steps involved.
However, the initial results highlighted the need for more comprehensive information. For instance, some routes suggested by the algorithm involved starting materials that were not commercially available, rendering them impractical. This underscores the importance of including information on the availability and complexity of starting materials in the predictive models. Ultimately, these efforts represent significant strides towards integrating artificial intelligence with human expertise in chemical synthesis. The continuous improvement of predictive algorithms, coupled with high-quality training data, promises to revolutionize the way chemists approach the synthesis of complex molecules, paving the way for more efficient and innovative drug development processes.
Leveraging Our Platform for Comprehensive Synthesis Planning
Our platform can integrate results from top predictive tools* (*depending on subscription plan) such as IBM Rxn4Chemistry, ASKCOS, Synthia, and Chemical.AI, providing a comprehensive solution for synthesis planning. By unifying these advanced algorithms, we offer users access to a wide array of predictive models and synthesis routes. Our platform excels in data curation and filtering, utilising extensive chemical databases and real-time project feedback to refine predictions. This approach significantly reduces noise, ensuring only the most viable and innovative routes are presented. The intuitive user interface facilitates visualisation and exploration of synthesis pathways, fostering seamless collaboration between chemists, cheminformatics experts, and data scientists.
Figure 2: Data aggregation capabilities possible from RxnHub platform
Advanced query capabilities, such as Dijkstra’s algorithm, enable highly customised searches for optimal synthesis routes based on specific constraints, including cost, safety, and environmental impact. Users can find the shortest and most practical routes tailored to their project needs. Additionally, our platform continuously learns from user interactions and integrates the latest scientific data, ensuring it stays at the cutting edge of chemical research. This dynamic and adaptable approach makes our platform an invaluable tool for any project involving synthesis planning, from drug development to material science and beyond.