AI Medicinal Chemist Published in Nature: Recreating the Professional Knowledge of Professional Chemists is Expected to Accelerate Drug Development

2023-11-02 04:32:04

Original source: Academic Headlines

Image source: Generated by Unbounded AI

Drug discovery is a complex, multi-step process that involves the intersection of many chemistry and biology subdisciplines. Human medicinal chemists, with their accumulated expertise over the years, play an important role in this.

So, can artificial intelligence (AI) play a role for medicinal chemists in drug discovery? The answer is probably yes. **

A team of researchers from the Novartis Institute for Biomedical Research (NIBR) and Microsoft Research AI4Science has come together to propose a machine learning model that partially reproduces the collective knowledge that occupational chemists accumulate in their work, often referred to as "chemical intuition." **

The research team believes that this approach may be used as a complement to molecular modeling, making future drug development more efficient.

The research paper titled "Extracting medicinal chemistry intuition viapreference machine learning" has been published in Nature Communications*.

Machine Learning Brings Medicinal Chemist Expertise Back

Medicinal chemists, both wet labs and computational in the "lead optimization" phase of drug discovery, play a critical role, as they are often asked to determine which compounds need to be synthesized and evaluated in subsequent optimization rounds.

To do this, medicinal chemists typically review data that includes compound properties such as activity, ADMET2, or target structure information. Therefore, the success of a project depends not only on the quality of the experimental data generated, but also on the robustness and rationality of the decisions made by the teams working on medicinal chemistry.

Medicinal chemists are able to make decisions more efficiently because they often draw on their expertise to have an intuitive understanding of the success factors in different iterations of early drug discovery. **

Although previous attempts have been made to formalize this knowledge using rule-based methods or simple cheminformatics feasibility scores, capturing the subtleties and complexities involved in medicinal chemist scoring remains a fundamental challenge.

With this motivation in mind, the study explores whether this expertise can be distilled into part of a machine learning model. Such a model could be deployed as an aid to the decision-making process in lead optimization or other aspects of drug discovery, as has been reported in the industry.

Considering that medicinal chemistry currently relies mainly on human work, it is inevitably subject to subjective bias. Some studies have reported low agreement in scores among medicinal chemists and within medicinal chemists. In this study, the researchers hope to solve some of the problems by borrowing strategies from multiplayer games. **

They treat the task of ranking a set of molecules as a preference learning problem, and then use a simple neural network to simulate people's individual preferences.

Figure: Overall schematic diagram of the main idea of the study (source: the paper)

Specifically, as shown in the figure above, the molecule is considered a participant in a competitive game, and the probability of one of them winning is determined by the feedback provided by the chemist. To do this, the medicinal chemist answers a pre-specified prompt for a question on a web application and selects one of two molecules. A total of 35 Novartis medicinal chemists were involved in the process, resulting in the collection of more than 5,000 annotations.

And this feedback gave birth to an implicit scoring model. The model uses a model with two independent neural network structures, each with fixed weights, to characterize the molecule with common cheminformatics descriptors. During training, its parameters are optimized by a binary cross-entropy loss (BCE loss), which relies on the potential score difference of the molecular pair and the feedback provided by the chemist.

Once the training is complete, the score of any arbitrary molecule can be inferred, which can then be used for downstream cheminformatics tasks.

In addition, the model can also more accurately judge the similarity between different drugs, and the learning scoring function proposed in this study is more accurate than the traditional drug similarity assessment index (QED). **

Notably, in order to facilitate the reproducibility of the study and further development of the field, the researchers also provided a software package called "MolSkill", which contains the model and anonymized response data.

Deficiencies and Applications of Machine Learning in Medicinal Chemistry

However, while this model can reproduce the knowledge accumulated by medicinal chemists in their work, there are some limitations. First, the questions asked during data collection have been vague to capture chemical intuition.

Also, while the proposed study design resulted in greater agreement between participants compared to previous studies, the pairwise comparison method was also not perfect.

In addition, the "Flatland fallacy" makes humans tend to reduce high-dimensional problems to a small set of cognitively traceable variables, and this simplification may be influenced by the characteristics of each medicinal chemist.

However, the research team stated that the model proposed in this study is not limited to the scope of application of the current study. Specifically, the framework discussed could be extended to other quantifiable but expensive observables in the field of drug discovery. In addition, it can provide insights into unexplored areas of the chemical space.

With this in mind, the research team believes that some popular rule-based filters can be learned from human-generated training data to build similar architectures that can overcome the major limitation of having to manually filter compounds before making inferences.

In the same direction, the proposed scoring method can also be used to prioritize combinatorial-generating compounds in synthetic chemistry libraries that are difficult to screen using existing rule-based methods due to their natural novelty.

Another research direction is to test the applicability of the research framework in a forward-looking, target-specific primary optimization scenario that requires a comprehensive consideration of information from multiple sources (e.g., biological properties, ADMET, etc.).

"Machine learning methods can design thousands of compounds, and techniques such as high-throughput screening can highlight a large number of candidate compounds in the early stages of the drug discovery process," the research team wrote in the paper. The proposed scoring method is being used to implicitly integrate the chemist's intuition and screen compounds without manual checks. It is expected that this application will accelerate the adoption of the method and the increase in trust in the coming years. ”

Paper Links:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes

Reward
1
Comment
Share

Comment

0/400

No comments

Topic
Gate 2025 Q2 Report Released
27k Popularity
CPI Data Incoming
57k Popularity
Altcoin Season Update
7k Popularity
4Bitcoin Whale Moves
589 Popularity
5Gate Derivatives Volume Hits New High
16k Popularity
6Crypto Legislation Voting Week
5k Popularity
7MicroStrategy Buys More Bitcoin
2k Popularity
8BTC Hits New High
95k Popularity
9My Gate Moments
27k Popularity
10VIP Exclusive Airdrop Carnival
26k Popularity

sitemap