93 results found
Ballester P, 2024, Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers, Journal of Advanced Research, ISSN: 2090-1232
Ogunleye A, Piyawajanusorn C, Ghislat G, et al., 2024, Large-scale machine learning analysis reveals DNA methylation and gene expression response signatures for gemcitabine-treated pancreatic cancer, Health Data Science, Vol: 4, ISSN: 2765-8783
Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic–area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.
Ballester PJ, 2023, The AI revolution in chemistry is not that far away, Nature, Vol: 624, Pages: 252-252, ISSN: 0028-0836
Tran-Nguyen V-K, Junaid M, Simeon S, et al., 2023, A practical guide to machine-learning scoring for structure-based virtual screening, Nature Protocols, Vol: 18, Pages: 3460-3511, ISSN: 1750-2799
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
UK Government Office for Science, 2023, Future risks of frontier AI, Publisher: UK Government Office for Science
Loecher A, Bruyns-Haylett M, Ballester PJ, et al., 2023, A machine learning approach to predict cellular uptake of pBAE polyplexes, Biomaterials Science, Vol: 11, Pages: 5797-5808, ISSN: 2047-4830
The delivery of genetic material (DNA and RNA) to cells can cure a wide range of diseases but is limited by the delivery efficiency of the carrier system. Poly β-amino esters (pBAEs) are promising polymer-based vectors that form polyplexes with negatively charged oligonucleotides, enabling cell membrane uptake and gene delivery. pBAE backbone polymer chemistry, as well as terminal oligopeptide modifications, define cellular uptake and transfection efficiency in a given cell line, along with nanoparticle size and polydispersity. Moreover, uptake and transfection efficiency of a given polyplex formulation also vary from cell type to cell type. Therefore, finding the optimal formulation leading to high uptake in a new cell line is dictated by trial and error, and requires time and resources. Machine learning (ML) is an ideal in silico screening tool to learn the non-linearities of complex data sets, like the one presented herein, with the aim of predicting cellular internalisation of pBAE polyplexes. A library of pBAE nanoparticles was fabricated and the uptake studied in 4 different cell lines, on which various ML models were successfully trained. The best performing models were found to be gradient-boosted trees and neural networks. The gradient-boosted trees model was then analysed using SHapley Additive exPlanations, to interpret the model and gain an understanding into the important features and their impact on the predicted outcome.
Tran-Nguyen V-K, Ballester P, 2023, Beware of simple methods for structure-based virtual screening: the critical importance of broader comparisons, Journal of Chemical Information and Modeling, Vol: 63, Pages: 1401-1405, ISSN: 1549-9596
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Hernandez-Hernandez S, Ballester PJ, 2023, On the best way to cluster NCI-60 molecules, Biomolecules, Vol: 13, ISSN: 2218-273X
Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor–Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor–Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.
Ogunleye AZ, Piyawajanusorn C, Goncalves A, et al., 2022, Interpretable machine learning models to predict the resistance of breast cancer patients to doxorubicin from their microRNA profiles, Advanced Science, Vol: 9, ISSN: 2198-3844
Doxorubicin is a common treatment for breast cancer. However, not all patients respond to this drug, which sometimes causes life-threatening side effects. Accurately anticipating doxorubicin-resistant patients would therefore permit to spare them this risk while considering alternative treatments without delay. Stratifying patients based on molecular markers in their pretreatment tumors is a promising approach to advance toward this ambitious goal, but single-gene gene markers such as HER2 expression have not shown to be sufficiently predictive. The recent availability of matched doxorubicin-response and diverse molecular profiles across breast cancer patients permits now analysis at a much larger scale. 16 machine learning algorithms and 8 molecular profiles are systematically evaluated on the same cohort of patients. Only 2 of the 128 resulting models are substantially predictive, showing that they can be easily missed by a standard-scale analysis. The best model is classification and regression tree (CART) nonlinearly combining 4 selected miRNA isoforms to predict doxorubicin response (median Matthew correlation coefficient (MCC) and area under the curve (AUC) of 0.56 and 0.80, respectively). By contrast, HER2 expression is significantly less predictive (median MCC and AUC of 0.14 and 0.57, respectively). As the predictive accuracy of this CART model increases with larger training sets, its update with future data should result in even better accuracy.
Ballester PJ, Stevens R, Haibe-Kains B, et al., 2022, Artificial intelligence for drug response prediction in disease models, BRIEFINGS IN BIOINFORMATICS, Vol: 23, ISSN: 1467-5463
Hernández-Hernández S, Vishwakarma S, Ballester PJ, 2022, Conformal prediction of small-molecule drug resistance in cancer cell lines, Volume 179: Conformal and Probabilistic Prediction with Applications, Publisher: Machine Learning Research, Pages: 92-108
Drug design is a critical step in the drug discovery process, where promising drug molecules are engineered to be later evaluated preclinically and perhaps clinically. Phenotypic drug design has again gained traction. Cancer cell lines, a frequently adopted in vitro model for phenotype drug design, can be used to evaluate the drug resistance level (lack of inhibitory activity, for example) of a large number of molecules, and discard those that are the least likely to become drug candidates. By reusing these datasets, supervised learning models have been built to predict drug resistance on cancer cell lines. Usually, these methods have assigned reliability to the whole model rather than reliability to individual predictions (molecules). In problems such as drug design, accurately achieving the latter would revolutionize decision making. Conformal prediction is a model-agnostic method to assign reliability to each model prediction. In this study, we investigated the impact of conformal prediction on the prediction of inhibitory activity of molecules on a given cancer cell line. This analysis was carried out in each of the 60 cell lines from the NCI-60 panel to understand the variability of the results across cancer types. We also discussed the implications of predicting the molecules considered most potent. In addition, we investigated how the further subdivision of the training set to build conformal prediction models may affect the results obtained. Overall, we observed that those molecules deemed most reliable by conformal prediction are substantially better predicted than those that are not. This suggest that such computational tools are promising to guide phenotypic drug design.
Tran-Nguyen V-K, Simeon S, Junaid M, et al., 2022, Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions, Current Research in Structural Biology, Vol: 4, Pages: 206-210, ISSN: 2665-928X
The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.
Ghislat G, Rahman T, Ballester PJ, 2021, Recent progress on the prospective application of machine learning to structure-based virtual screening, Current Opinion in Chemical Biology, Vol: 65, Pages: 28-34
Simeon S, Ghislat G, Ballester P, 2021, Characterizing the Relationship Between the Chemical Structures of Drugs and their Activities on Primary Cultures of Pediatric Solid Tumors, Current Medicinal Chemistry, Vol: 28, Pages: 7830-7839
Frasser CF, Benito CD, Skibinsky-Gitlin ES, et al., 2021, Using stochastic computing for virtual screening acceleration, Electronics, Vol: 10, ISSN: 2079-9292
Stochastic computing is an emerging scientific field pushed by the need for developing high-performance artificial intelligence systems in hardware to quickly solve complex data processing problems. This is the case of virtual screening, a computational task aimed at searching across huge molecular databases for new drug leads. In this work, we show a classification framework in which molecules are described by an energy-based vector. This vector is then processed by an ultra-fast artificial neural network implemented through FPGA by using stochastic computing techniques. Compared to other previously published virtual screening methods, this proposal provides similar or higher accuracy, while it improves processing speed by about two or three orders of magnitude.
Nguyen LC, Naulaerts S, Bruna A, et al., 2021, Predicting cancer drug response in vivo by learning an optimal feature selection of tumour molecular profiles, Biomedicines, Vol: 9, ISSN: 2227-9059
(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.
Ballester PJ, Carmona J, 2021, Artificial intelligence for the next generation of precision oncology, npj Precision Oncology, Vol: 5, ISSN: 2397-768X
Piyawajanusorn C, Nguyen LC, Ghislat G, et al., 2021, A gentle introduction to understanding preclinical data for cancer pharmaco-omic modeling, Briefings in Bioinformatics
Ghislat G, Cheema AS, Baudoin E, et al., 2021, NF-κB-dependent IRF1 activation programs cDC1 dendritic cells to drive antitumor immunity, Science Immunology, Vol: 6, ISSN: 2470-9468
Conventional type 1 dendritic cells (cDC1s) are critical for antitumor immunity. They acquire antigens from dying tumor cells and cross-present them to CD8+ T cells, promoting the expansion of tumor-specific cytotoxic T cells. However, the signaling pathways that govern the antitumor functions of cDC1s in immunogenic tumors are poorly understood. Using single-cell transcriptomics to examine the molecular pathways regulating intratumoral cDC1 maturation, we found nuclear factor κB (NF-κB) and interferon (IFN) pathways to be highly enriched in a subset of functionally mature cDC1s. We identified an NF-κB–dependent and IFN-γ–regulated gene network in cDC1s, including cytokines and chemokines specialized in the recruitment and activation of cytotoxic T cells. By mapping the trajectory of intratumoral cDC1 maturation, we demonstrated the dynamic reprogramming of tumor-infiltrating cDC1s by NF-κB and IFN signaling pathways. This maturation process was perturbed by specific inactivation of either NF-κB or IFN regulatory factor 1 (IRF1) in cDC1s, resulting in impaired expression of IFN-γ–responsive genes and consequently a failure to efficiently recruit and activate antitumoral CD8+ T cells. Last, we demonstrate the relevance of these findings to patients with melanoma, showing that activation of the NF-κB/IRF1 axis in association with cDC1s is linked with improved clinical outcome. The NF-κB/IRF1 axis in cDC1s may therefore represent an important focal point for the development of new diagnostic and therapeutic approaches to improve cancer immunotherapy.
Fresnais L, Ballester PJ, 2021, The impact of compound library size on the performance of scoring functions for structure-based virtual screening, Briefings in Bioinformatics, Vol: 22, ISSN: 1467-5463
Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
Li H, Sze K-H, Lu G, et al., 2021, Machine-learning scoring functions for structure-based virtual screening, Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol: 11, ISSN: 1759-0876
Molecular docking predicts whether and how small molecules bind to a macromolecular target using a suitable 3D structure. Scoring functions for structure-based virtual screening primarily aim at discovering which molecules bind to the considered target when these form part of a library with a much higher proportion of non-binders. Classical scoring functions are essentially models building a linear mapping between the features describing a protein–ligand complex and its binding label. Machine learning, a major subfield of artificial intelligence, can also be used to build fast supervised learning models for this task. In this review, we analyzed such machine-learning scoring functions for structure-based virtual screening in the period 2015–2019. We have discussed what the shortcomings of current benchmarks really mean and what valid alternatives have been employed. The latter retrospective studies observed that machine-learning scoring functions were substantially more accurate, in terms of higher hit rates and potencies, than the classical scoring functions they were compared to. Several of these machine-learning scoring functions were also employed in prospective studies, in which mid-nanomolar binders with novel chemical structures were directly discovered without any potency optimization. We have thus highlighted the codes and webservers that are available to build or apply machine-learning scoring functions to prospective structure-based virtual screening studies. A discussion of prospects for future work completes this review.
Ghislat G, Cheema AS, Baudoin E, et al., 2020, An NF-$\upkappa$B/IRF1 axis programs cDC1s to drive anti-tumor immunity
Ariey-Bonnet J, Carrasco K, Grand ML, et al., 2020, In silico molecular target prediction unveils mebendazole as a potent MAPK14 inhibitor, Molecular Oncology, Vol: 14, Pages: 3083-3099, ISSN: 1574-7891
The concept of polypharmacology involves the interaction of drug molecules with multiple molecular targets. It provides a unique opportunity for the repurposing of already-approved drugs to target key factors involved in human diseases. Herein, we used an in silico target prediction algorithm to investigate the mechanism of action of mebendazole, an antihelminthic drug, currently repurposed in the treatment of brain tumors. First, we confirmed that mebendazole decreased the viability of glioblastoma cells in vitro (IC50 values ranging from 288 nm to 2.1 µm). Our in silico approach unveiled 21 putative molecular targets for mebendazole, including 12 proteins significantly upregulated at the gene level in glioblastoma as compared to normal brain tissue (fold change > 1.5; P < 0.0001). Validation experiments were performed on three major kinases involved in cancer biology: ABL1, MAPK1/ERK2, and MAPK14/p38α. Mebendazole could inhibit the activity of these kinases in vitro in a dose-dependent manner, with a high potency against MAPK14 (IC50 = 104 ± 46 nm). Its direct binding to MAPK14 was further validated in vitro, and inhibition of MAPK14 kinase activity was confirmed in live glioblastoma cells. Consistent with biophysical data, molecular modeling suggested that mebendazole was able to bind to the catalytic site of MAPK14. Finally, gene silencing demonstrated that MAPK14 is involved in glioblastoma tumor spheroid growth and response to mebendazole treatment. This study thus highlighted the role of MAPK14 in the anticancer mechanism of action of mebendazole and provides further rationale for the pharmacological targeting of MAPK14 in brain tumors. It also opens new avenues for the development of novel MAPK14/p38α inhibitors to treat human diseases.
Ghislat G, Rahman T, Ballester PJ, 2020, Identification and validation of carbonic anhydrase II as the first target of the anti-inflammatory drug actarit, Biomolecules, Vol: 10, ISSN: 2218-273X
Background and purpose: Identifying the macromolecular targets of drug molecules is a fundamental aspect of drug discovery and pharmacology. Several drugs remain without known targets (orphan) despite large-scale in silico and in vitro target prediction efforts. Ligand-centric chemical-similarity-based methods for in silico target prediction have been found to be particularly powerful, but the question remains of whether they are able to discover targets for target-orphan drugs. Experimental Approach: We used one of these in silico methods to carry out a target prediction analysis for two orphan drugs: actarit and malotilate. The top target predicted for each drug was carbonic anhydrase II (CAII). Each drug was therefore quantitatively evaluated for CAII inhibition to validate these two prospective predictions. Key Results: Actarit showed in vitro concentration-dependent inhibition of CAII activity with submicromolar potency (IC50 = 422 nM) whilst no consistent inhibition was observed for malotilate. Among the other 25 targets predicted for actarit, RORγ (RAR-related orphan receptor-gamma) is promising in that it is strongly related to actarit’s indication, rheumatoid arthritis (RA). Conclusion and Implications: This study is a proof-of-concept of the utility of MolTarPred for the fast and cost-effective identification of targets of orphan drugs. Furthermore, the mechanism of action of actarit as an anti-RA agent can now be re-examined from a CAII-inhibitor perspective, given existing relationships between this target and RA. Moreover, the confirmed CAII-actarit association supports investigating the repositioning of actarit on other CAII-linked indications (e.g., hypertension, epilepsy, migraine, anemia and bone, eye and cardiac disorders).
Li H, Sze K-H, Lu G, et al., 2020, Machine-learning scoring functions for structure-based drug lead optimization, Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol: 10, ISSN: 1759-0876
Molecular docking can be used to predict how strongly small-molecule binders and their chemical derivatives bind to a macromolecular target using its available three-dimensional structures. Scoring functions (SFs) are employed to rank these molecules by their predicted binding affinity (potency). A classical SF assumes a predetermined theory-inspired functional form for the relationship between the features characterizing the structure of the protein–ligand complex and its predicted binding affinity (this relationship is almost always assumed to be linear). Recent years have seen the prosperity of machine-learning SFs, which are fast regression models built instead with contemporary supervised learning algorithms. In this review, we analyzed machine-learning SFs for drug lead optimization in the 2015–2019 period. The performance gap between classical and machine-learning SFs was large and has now broadened owing to methodological improvements and the availability of more training data. Against the expectations of many experts, SFs employing deep learning techniques were not always more predictive than those based on more established machine learning techniques and, when they were, the performance gain was small. More codes and webservers are available and ready to be applied to prospective structure-based drug lead optimization studies. These have exhibited excellent predictive accuracy in compelling retrospective tests, outperforming in some cases much more computationally demanding molecular simulation-based methods. A discussion of future work completes this review.
Ahmad S, Ballester PJ, Fernandez M, 2020, Editorial: Intelligent systems for genome functional annotations, Frontiers in Genetics, Vol: 11, ISSN: 1664-8021
Naulaerts S, Menden MP, Ballester PJ, 2020, Concise polygenic models for cancer-specific identification of drug-sensitive tumors from their multi-omics profiles, Biomolecules, Vol: 10, ISSN: 2218-273X
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.
Ballester P, 2020, Stochastic-based Neural Network hardware acceleration for an efficient ligand-based virtual screening
Ballester PJ, 2019, Selecting machine-learning scoring functions for structure-based virtual screening, Drug Discovery Today: Technologies, Vol: 32-33, Pages: 81-87, ISSN: 1740-6749
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
Bomane A, Gonçalves A, Ballester PJ, 2019, Paclitaxel response can be predicted with interpretable multi-variate classifiers exploiting DNA-methylation and miRNA data, Frontiers in Genetics, Vol: 10, ISSN: 1664-8021
To address the problem of resistance to paclitaxel treatment, we have investigated to which extent is possible to predict Breast Cancer (BC) patient response to this drug. We carried out a large-scale tumor-based prediction analysis using data from the US National Cancer Institute’s Genomic Data Commons. These data sets comprise the responses of BC patients to paclitaxel along with six molecular profiles of their tumors. We assessed 10 Machine Learning (ML) algorithms on each of these profiles and evaluated the resulting 60 classifiers on the same BC patients. DNA methylation and miRNA profiles were the most informative overall. In combination with these two profiles, ML algorithms selecting the smallest subset of molecular features generated the most predictive classifiers: a complexity-optimized XGBoost classifier based on CpG island methylation extracted a subset of molecular factors relevant to predict paclitaxel response (AUC = 0.74). A CpG site methylation-based Decision Tree (DT) combining only 2 of the 22,941 considered CpG sites (AUC = 0.89) and a miRNA expression-based DT employing just 4 of the 337 analyzed mature miRNAs (AUC = 0.72) reveal the molecular types associated to paclitaxel-sensitive and resistant BC tumors. A literature review shows that features selected by these three classifiers have been individually linked to the cytotoxic-drug sensitivities and prognosis of BC patients. Our work leads to several molecular signatures, unearthed from methylome and miRNome, able to anticipate to some extent which BC tumors respond or not to paclitaxel. These results may provide insights to optimize paclitaxel-therapies in clinical practice.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.