Paper source: https://www.ncbi.nlm.nih.gov/pubmed/27642066
The researchers developed a novel data-driven approach, called PrOCTOR, that integrates chemical properties of a compound and its protein targets, to provide a measure about whether the drugs in clinical trials will fail for toxicity reasons. This paper was published in Cell Chemical Biology, 2016.
Motivation of the study
Over the past decade, great progresses have been made in the treatment of various diseases. However, drug attrition rates have continued to rise substantially, due to clinical trial failure. Nearly one-third of drugs that fail clinical trials do so because of intolerable side effects. Clinical trials of potential drugs can fail for various reasons, ranging from design issues to drug efficacy and toxicity problems.
Almost two decades ago, drug-likeness approaches were proposed to improve the successful rate of drug discovery. Drug-likeness approaches aim at predicting whether a chemical compound is a potential drug. However, all the drug-likeness methods consider molecular properties of the drug itself alone, and cannot distinguish drugs with unmanageable toxicity from safe ones. To solve this problem, the authors make use of publicly available data of drugs that failed for toxicity in trials (FTT drugs) to develop a machine learning algorithm that could accurately predict the toxicity of potential drugs.
Data, features and prediction algorithm
The authors combined chemical properties of a compound, along with that of its protein targets, to generate a new measure, called “PrOCTOR score”, that helps predict whether drugs in clinical trials will fail for toxicity (Figure 1). The authors trained PrOCTOR using the random forest model, which is an ensemble learning method for classification. The random forest model constructs 50 bootstrapped decision trees.
Figure 1. The algorithm of PrOCTOR method.
Compared to the drug-likeness approaches, the PrOCTOR model includes much more features. In total, there are 48 features for each drugs in the PrOCTOR model: 10 molecular properties, 34 target-based properties, and 4 drug-likeness rule features. As anticipated, no single feature could successfully distinguish FTT drugs (i.e. toxic ones) from the FDA-approved drugs (safe ones) (Figure 2).
Figure 2. Distribution of individual features for FTT drugs (red) and FDA-approved drugs (blue).
The performance of the model
A large dataset of 784 FDA-approved drugs and 100 FTT drugs was used to train the model. On the training dataset, the PrOCTOR score performs at a high accuracy, specificity and sensitivity (∼0.75), as well as high area under the ROC curve (>0.80). In comparison, none of the drug-likeness approaches were able to successfully maintain both high sensitivity and specificity (Figure 3). Robust feature analysis of the model indicates that it is critical to consider properties of the drug target, with the target’s network connectivity and liver toxicity as two of the most important features.
Figure 3. ROC curves for PrOCTOR and drug-likeness methods.
They then tested the model on hundreds of drugs approved in Europe and Japan, and on an larger database of 3236 drugs not included in the training dataset. Overall, PrOCTOR accurately predicted drug toxicity in the test dataset, and even identified FDA-approved drugs that were later monitored for reports of serious side effects (Figure 4).
Figure 4. (Left) PrOCTOR scores for the FTT drugs and FDA-approved in the training set, as well as European (EMA)- and Japanese (JP17)-approved drugs. (Right) Three molecules predicted by PrOCTOR as most likely to possess toxicity. Notably, all are FDA-approved drugs that have been later reported with serious side effects.
Lessons learned from PrOCTOR project
The PrOCTOR method provides a novel, data-driven and broadly applicable strategy that is able to identify drugs likely to possess toxicity in clinical trials. This approach will tremendously speed up the process of discovery and testing new drugs.
Although PrOCTOR method’s algorithm itself of was not very complicated, we can learn some valuable lessons from its success: (i) the authors made full use of publicly available failure data during drug discovery; (ii) they took previously overlooked features into consideration. That is to say, all the information available on clinical trials and the potential drugs should be collected and used.
Gayvert KM et al. (2016) A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials. Cell Chemical Biology.
Analyst: Genome Hunter, Yu-Cheng Yang|Editor: Hao Wang|Localized by Synced Global Team : Xiang Chen