Publications

Caba K, Tran-Nguyen VK, Rahman T, Ballester PJet al., 2024, Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors, Journal of Cheminformatics, Vol: 16

Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein–ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.

Abstract
Cite

Journal article

Ballester P, Caba K, 2024, Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors, Journal of Cheminformatics, ISSN: 1758-2946

Cite

Journal article

Gomez-Sacristan P, Simeon S, Tran-Nguyen V-K, Patil S, Ballester Pet al., 2024, Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers, Journal of Advanced Research, ISSN: 2090-1232

Introduction:Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers.Objectives:We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization.Methods:By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets.Results:60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression.Conclusion:PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.

Journal article

Ogunleye A, Piyawajanusorn C, Ghislat G, Ballester PJet al., 2024, Large-scale machine learning analysis reveals DNA methylation and gene expression response signatures for gemcitabine-treated pancreatic cancer, Health Data Science, Vol: 4, ISSN: 2765-8783

Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic–area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.

Journal article

Ballester PJ, 2023, The AI revolution in chemistry is not that far away, Nature, Vol: 624, Pages: 252-252, ISSN: 0028-0836

Journal article

Tran-Nguyen V-K, Junaid M, Simeon S, Ballester PJet al., 2023, A practical guide to machine-learning scoring for structure-based virtual screening, Nature Protocols, Vol: 18, Pages: 3460-3511, ISSN: 1750-2799

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.

Journal article

UK Government Office for Science, 2023, Future risks of frontier AI: which capabilities and risks could emerge at the cutting edge of AI in the future?, Publisher: UK Government Office for Science

Report

Loecher A, Bruyns-Haylett M, Ballester PJ, Borros S, Oliva Net al., 2023, A machine learning approach to predict cellular uptake of pBAE polyplexes, Biomaterials Science, Vol: 11, Pages: 5797-5808, ISSN: 2047-4830

The delivery of genetic material (DNA and RNA) to cells can cure a wide range of diseases but is limited by the delivery efficiency of the carrier system. Poly β-amino esters (pBAEs) are promising polymer-based vectors that form polyplexes with negatively charged oligonucleotides, enabling cell membrane uptake and gene delivery. pBAE backbone polymer chemistry, as well as terminal oligopeptide modifications, define cellular uptake and transfection efficiency in a given cell line, along with nanoparticle size and polydispersity. Moreover, uptake and transfection efficiency of a given polyplex formulation also vary from cell type to cell type. Therefore, finding the optimal formulation leading to high uptake in a new cell line is dictated by trial and error, and requires time and resources. Machine learning (ML) is an ideal in silico screening tool to learn the non-linearities of complex data sets, like the one presented herein, with the aim of predicting cellular internalisation of pBAE polyplexes. A library of pBAE nanoparticles was fabricated and the uptake studied in 4 different cell lines, on which various ML models were successfully trained. The best performing models were found to be gradient-boosted trees and neural networks. The gradient-boosted trees model was then analysed using SHapley Additive exPlanations, to interpret the model and gain an understanding into the important features and their impact on the predicted outcome.

Journal article

Tran-Nguyen V-K, Ballester P, 2023, Beware of simple methods for structure-based virtual screening: the critical importance of broader comparisons, Journal of Chemical Information and Modeling, Vol: 63, Pages: 1401-1405, ISSN: 1549-9596

We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.

Journal article

Hernandez-Hernandez S, Ballester PJ, 2023, On the best way to cluster NCI-60 molecules, Biomolecules, Vol: 13, ISSN: 2218-273X

Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor–Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor–Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.

Journal article

Ogunleye AZ, Piyawajanusorn C, Goncalves A, Ghislat G, Ballester PJet al., 2022, Interpretable machine learning models to predict the resistance of breast cancer patients to doxorubicin from their microRNA profiles, Advanced Science, Vol: 9, ISSN: 2198-3844

Doxorubicin is a common treatment for breast cancer. However, not all patients respond to this drug, which sometimes causes life-threatening side effects. Accurately anticipating doxorubicin-resistant patients would therefore permit to spare them this risk while considering alternative treatments without delay. Stratifying patients based on molecular markers in their pretreatment tumors is a promising approach to advance toward this ambitious goal, but single-gene gene markers such as HER2 expression have not shown to be sufficiently predictive. The recent availability of matched doxorubicin-response and diverse molecular profiles across breast cancer patients permits now analysis at a much larger scale. 16 machine learning algorithms and 8 molecular profiles are systematically evaluated on the same cohort of patients. Only 2 of the 128 resulting models are substantially predictive, showing that they can be easily missed by a standard-scale analysis. The best model is classification and regression tree (CART) nonlinearly combining 4 selected miRNA isoforms to predict doxorubicin response (median Matthew correlation coefficient (MCC) and area under the curve (AUC) of 0.56 and 0.80, respectively). By contrast, HER2 expression is significantly less predictive (median MCC and AUC of 0.14 and 0.57, respectively). As the predictive accuracy of this CART model increases with larger training sets, its update with future data should result in even better accuracy.

Journal article

Ballester PJ, Stevens R, Haibe-Kains B, Huang RS, Aittokallio Tet al., 2022, Artificial intelligence for drug response prediction in disease models, BRIEFINGS IN BIOINFORMATICS, Vol: 23, ISSN: 1467-5463

Journal article

Hernández-Hernández S, Vishwakarma S, Ballester PJ, 2022, Conformal prediction of small-molecule drug resistance in cancer cell lines, Volume 179: Conformal and Probabilistic Prediction with Applications, Publisher: Machine Learning Research, Pages: 92-108

Drug design is a critical step in the drug discovery process, where promising drug molecules are engineered to be later evaluated preclinically and perhaps clinically. Phenotypic drug design has again gained traction. Cancer cell lines, a frequently adopted in vitro model for phenotype drug design, can be used to evaluate the drug resistance level (lack of inhibitory activity, for example) of a large number of molecules, and discard those that are the least likely to become drug candidates. By reusing these datasets, supervised learning models have been built to predict drug resistance on cancer cell lines. Usually, these methods have assigned reliability to the whole model rather than reliability to individual predictions (molecules). In problems such as drug design, accurately achieving the latter would revolutionize decision making. Conformal prediction is a model-agnostic method to assign reliability to each model prediction. In this study, we investigated the impact of conformal prediction on the prediction of inhibitory activity of molecules on a given cancer cell line. This analysis was carried out in each of the 60 cell lines from the NCI-60 panel to understand the variability of the results across cancer types. We also discussed the implications of predicting the molecules considered most potent. In addition, we investigated how the further subdivision of the training set to build conformal prediction models may affect the results obtained. Overall, we observed that those molecules deemed most reliable by conformal prediction are substantially better predicted than those that are not. This suggest that such computational tools are promising to guide phenotypic drug design.

Conference paper

Tran-Nguyen V-K, Simeon S, Junaid M, Ballester PJet al., 2022, Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions, Current Research in Structural Biology, Vol: 4, Pages: 206-210, ISSN: 2665-928X

The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.

Journal article

Simeon S, Ghislat G, Ballester P, 2021, Characterizing the Relationship Between the Chemical Structures of Drugs and their Activities on Primary Cultures of Pediatric Solid Tumors, Current Medicinal Chemistry, Vol: 28, Pages: 7830-7839

Journal article

Ghislat G, Rahman T, Ballester PJ, 2021, Recent progress on the prospective application of machine learning to structure-based virtual screening, Current Opinion in Chemical Biology, Vol: 65, Pages: 28-34

Journal article

Frasser CF, Benito CD, Skibinsky-Gitlin ES, Canals V, Font-Rosselló J, Roca M, Ballester PJ, Rosselló JLet al., 2021, Using stochastic computing for virtual screening acceleration, Electronics, Vol: 10, ISSN: 2079-9292

Stochastic computing is an emerging scientific field pushed by the need for developing high-performance artificial intelligence systems in hardware to quickly solve complex data processing problems. This is the case of virtual screening, a computational task aimed at searching across huge molecular databases for new drug leads. In this work, we show a classification framework in which molecules are described by an energy-based vector. This vector is then processed by an ultra-fast artificial neural network implemented through FPGA by using stochastic computing techniques. Compared to other previously published virtual screening methods, this proposal provides similar or higher accuracy, while it improves processing speed by about two or three orders of magnitude.

Journal article

Nguyen LC, Naulaerts S, Bruna A, Ghislat G, Ballester PJet al., 2021, Predicting cancer drug response in vivo by learning an optimal feature selection of tumour molecular profiles, Biomedicines, Vol: 9, ISSN: 2227-9059

(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.

Journal article

Ballester PJ, Carmona J, 2021, Artificial intelligence for the next generation of precision oncology, npj Precision Oncology, Vol: 5, ISSN: 2397-768X

Journal article

Piyawajanusorn C, Nguyen LC, Ghislat G, Ballester PJet al., 2021, A gentle introduction to understanding preclinical data for cancer pharmaco-omic modeling, Briefings in Bioinformatics

Journal article

Ghislat G, Cheema AS, Baudoin E, Verthuy C, Ballester PJ, Crozat K, Attaf N, Dong C, Milpied P, Malissen B, Auphan-Anezin N, Manh TPV, Dalod M, Lawrence Tet al., 2021, NF-κB-dependent IRF1 activation programs cDC1 dendritic cells to drive antitumor immunity, Science Immunology, Vol: 6, ISSN: 2470-9468

Conventional type 1 dendritic cells (cDC1s) are critical for antitumor immunity. They acquire antigens from dying tumor cells and cross-present them to CD8+ T cells, promoting the expansion of tumor-specific cytotoxic T cells. However, the signaling pathways that govern the antitumor functions of cDC1s in immunogenic tumors are poorly understood. Using single-cell transcriptomics to examine the molecular pathways regulating intratumoral cDC1 maturation, we found nuclear factor κB (NF-κB) and interferon (IFN) pathways to be highly enriched in a subset of functionally mature cDC1s. We identified an NF-κB–dependent and IFN-γ–regulated gene network in cDC1s, including cytokines and chemokines specialized in the recruitment and activation of cytotoxic T cells. By mapping the trajectory of intratumoral cDC1 maturation, we demonstrated the dynamic reprogramming of tumor-infiltrating cDC1s by NF-κB and IFN signaling pathways. This maturation process was perturbed by specific inactivation of either NF-κB or IFN regulatory factor 1 (IRF1) in cDC1s, resulting in impaired expression of IFN-γ–responsive genes and consequently a failure to efficiently recruit and activate antitumoral CD8+ T cells. Last, we demonstrate the relevance of these findings to patients with melanoma, showing that activation of the NF-κB/IRF1 axis in association with cDC1s is linked with improved clinical outcome. The NF-κB/IRF1 axis in cDC1s may therefore represent an important focal point for the development of new diagnostic and therapeutic approaches to improve cancer immunotherapy.

Journal article

Fresnais L, Ballester PJ, 2021, The impact of compound library size on the performance of scoring functions for structure-based virtual screening, Briefings in Bioinformatics, Vol: 22, ISSN: 1467-5463

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.

Journal article

Li H, Sze K-H, Lu G, Ballester PJet al., 2021, Machine-learning scoring functions for structure-based virtual screening, Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol: 11, ISSN: 1759-0876

Molecular docking predicts whether and how small molecules bind to a macromolecular target using a suitable 3D structure. Scoring functions for structure-based virtual screening primarily aim at discovering which molecules bind to the considered target when these form part of a library with a much higher proportion of non-binders. Classical scoring functions are essentially models building a linear mapping between the features describing a protein–ligand complex and its binding label. Machine learning, a major subfield of artificial intelligence, can also be used to build fast supervised learning models for this task. In this review, we analyzed such machine-learning scoring functions for structure-based virtual screening in the period 2015–2019. We have discussed what the shortcomings of current benchmarks really mean and what valid alternatives have been employed. The latter retrospective studies observed that machine-learning scoring functions were substantially more accurate, in terms of higher hit rates and potencies, than the classical scoring functions they were compared to. Several of these machine-learning scoring functions were also employed in prospective studies, in which mid-nanomolar binders with novel chemical structures were directly discovered without any potency optimization. We have thus highlighted the codes and webservers that are available to build or apply machine-learning scoring functions to prospective structure-based virtual screening studies. A discussion of prospects for future work completes this review.

Journal article

Ghislat G, Cheema AS, Baudoin E, Verthuy C, Ballester P, Crozat K, Attaf N, Dong C, Milpied P, Malissen B, Auphan-Anezin N, Manh T-PV, Dalod M, Lawrence Tet al., 2020, An NF-$\upkappa$B/IRF1 axis programs cDC1s to drive anti-tumor immunity

Journal article

Ariey-Bonnet J, Carrasco K, Grand ML, Hoffer L, Betzi S, Feracci M, Tsvetkov P, Devred F, Collette Y, Morelli X, Ballester P, Pasquier Eet al., 2020, In silico molecular target prediction unveils mebendazole as a potent MAPK14 inhibitor, Molecular Oncology, Vol: 14, Pages: 3083-3099, ISSN: 1574-7891

The concept of polypharmacology involves the interaction of drug molecules with multiple molecular targets. It provides a unique opportunity for the repurposing of already-approved drugs to target key factors involved in human diseases. Herein, we used an in silico target prediction algorithm to investigate the mechanism of action of mebendazole, an antihelminthic drug, currently repurposed in the treatment of brain tumors. First, we confirmed that mebendazole decreased the viability of glioblastoma cells in vitro (IC50 values ranging from 288 nm to 2.1 µm). Our in silico approach unveiled 21 putative molecular targets for mebendazole, including 12 proteins significantly upregulated at the gene level in glioblastoma as compared to normal brain tissue (fold change > 1.5; P < 0.0001). Validation experiments were performed on three major kinases involved in cancer biology: ABL1, MAPK1/ERK2, and MAPK14/p38α. Mebendazole could inhibit the activity of these kinases in vitro in a dose-dependent manner, with a high potency against MAPK14 (IC50 = 104 ± 46 nm). Its direct binding to MAPK14 was further validated in vitro, and inhibition of MAPK14 kinase activity was confirmed in live glioblastoma cells. Consistent with biophysical data, molecular modeling suggested that mebendazole was able to bind to the catalytic site of MAPK14. Finally, gene silencing demonstrated that MAPK14 is involved in glioblastoma tumor spheroid growth and response to mebendazole treatment. This study thus highlighted the role of MAPK14 in the anticancer mechanism of action of mebendazole and provides further rationale for the pharmacological targeting of MAPK14 in brain tumors. It also opens new avenues for the development of novel MAPK14/p38α inhibitors to treat human diseases.

Journal article

Ghislat G, Rahman T, Ballester PJ, 2020, Identification and validation of carbonic anhydrase II as the first target of the anti-inflammatory drug actarit, Biomolecules, Vol: 10, ISSN: 2218-273X

Background and purpose: Identifying the macromolecular targets of drug molecules is a fundamental aspect of drug discovery and pharmacology. Several drugs remain without known targets (orphan) despite large-scale in silico and in vitro target prediction efforts. Ligand-centric chemical-similarity-based methods for in silico target prediction have been found to be particularly powerful, but the question remains of whether they are able to discover targets for target-orphan drugs. Experimental Approach: We used one of these in silico methods to carry out a target prediction analysis for two orphan drugs: actarit and malotilate. The top target predicted for each drug was carbonic anhydrase II (CAII). Each drug was therefore quantitatively evaluated for CAII inhibition to validate these two prospective predictions. Key Results: Actarit showed in vitro concentration-dependent inhibition of CAII activity with submicromolar potency (IC50 = 422 nM) whilst no consistent inhibition was observed for malotilate. Among the other 25 targets predicted for actarit, RORγ (RAR-related orphan receptor-gamma) is promising in that it is strongly related to actarit’s indication, rheumatoid arthritis (RA). Conclusion and Implications: This study is a proof-of-concept of the utility of MolTarPred for the fast and cost-effective identification of targets of orphan drugs. Furthermore, the mechanism of action of actarit as an anti-RA agent can now be re-examined from a CAII-inhibitor perspective, given existing relationships between this target and RA. Moreover, the confirmed CAII-actarit association supports investigating the repositioning of actarit on other CAII-linked indications (e.g., hypertension, epilepsy, migraine, anemia and bone, eye and cardiac disorders).

Journal article

Li H, Sze K-H, Lu G, Ballester PJet al., 2020, Machine-learning scoring functions for structure-based drug lead optimization, Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol: 10, ISSN: 1759-0876

Molecular docking can be used to predict how strongly small-molecule binders and their chemical derivatives bind to a macromolecular target using its available three-dimensional structures. Scoring functions (SFs) are employed to rank these molecules by their predicted binding affinity (potency). A classical SF assumes a predetermined theory-inspired functional form for the relationship between the features characterizing the structure of the protein–ligand complex and its predicted binding affinity (this relationship is almost always assumed to be linear). Recent years have seen the prosperity of machine-learning SFs, which are fast regression models built instead with contemporary supervised learning algorithms. In this review, we analyzed machine-learning SFs for drug lead optimization in the 2015–2019 period. The performance gap between classical and machine-learning SFs was large and has now broadened owing to methodological improvements and the availability of more training data. Against the expectations of many experts, SFs employing deep learning techniques were not always more predictive than those based on more established machine learning techniques and, when they were, the performance gain was small. More codes and webservers are available and ready to be applied to prospective structure-based drug lead optimization studies. These have exhibited excellent predictive accuracy in compelling retrospective tests, outperforming in some cases much more computationally demanding molecular simulation-based methods. A discussion of future work completes this review.

Journal article

Ahmad S, Ballester PJ, Fernandez M, 2020, Editorial: Intelligent systems for genome functional annotations, Frontiers in Genetics, Vol: 11, ISSN: 1664-8021

Journal article

Naulaerts S, Menden MP, Ballester PJ, 2020, Concise polygenic models for cancer-specific identification of drug-sensitive tumors from their multi-omics profiles, Biomolecules, Vol: 10, ISSN: 2218-273X

In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.

Journal article

Ballester P, 2020, Stochastic-based Neural Network hardware acceleration for an efficient ligand-based virtual screening

Cite

Other

DrPedroBallester

Contact

Location

Summary