Imperial College London

Dr Joram M. Posma PhD MSc B AS MRSC

Faculty of MedicineDepartment of Metabolism, Digestion and Reproduction

Senior Lecturer in Biomedical Informatics
 
 
 
//

Contact

 

j.posma11 Website

 
 
//

Location

 

E305Burlington DanesHammersmith Campus

//

Summary

 

Publications

Publication Type
Year
to

66 results found

Kasapi M, Xu K, Ebbels T, O'Regan D, Ware J, Posma JMet al., 2024, LAVASET: An ensemble method for correlated datasets with spatial, spectral, and temporal dependencies, Bioinformatics, Vol: 40, Pages: 1-9, ISSN: 1367-4811

Motivation: Random Forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction (DR). However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting. Addressing correlation relationships in high dimensional datasets is imperative for reducing the number of variables that are assigned high importance, hence making the DR most efficient. Here, we propose the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) method that derives latent variables based on the distance characteristics of each feature and aims to incorporate the correlation factor in the splitting step.Results: Without compromising on performance in the majority of examples, LAVASET outperforms RF by accurately determining feature importance across all correlated variables and ensuring proper distribution of importance values. LAVASET yields mostly non-inferior prediction accuracies to traditional RFs when tested in simulated and real 1D datasets, as well as more complex and high-dimensional 3D datatypes. Unlike traditional RFs, LAVASET is unaffected by single `important' noisy features (false positives), as it considers the local neighbourhood. LAVASET, therefore, highlights neighbourhoods of features, reflecting real signals that collectively impact the model's predictive ability.Availability: LAVASET is freely available as a standalone package from https://github.com/melkasapi/LAVASET.

Journal article

Barcroft JF, Linton-Reid K, Landolfo C, Al-Memar M, Parker N, Kyriacou C, Munaretto M, Fantauzzi M, Cooper N, Yazbek J, Bharwani N, Lee SR, Kim JH, Timmerman D, Posma J, Savelli L, Saso S, Aboagye EO, Bourne Tet al., 2024, Machine learning and radiomics for segmentation and classification of adnexal masses on ultrasound, NPJ Precis Oncol, Vol: 8, ISSN: 2397-768X

Ultrasound-based models exist to support the classification of adnexal masses but are subjective and rely upon ultrasound expertise. We aimed to develop an end-to-end machine learning (ML) model capable of automating the classification of adnexal masses. In this retrospective study, transvaginal ultrasound scan images with linked diagnoses (ultrasound subjective assessment or histology) were extracted and segmented from Imperial College Healthcare, UK (ICH development dataset; n = 577 masses; 1444 images) and Morgagni-Pierantoni Hospital, Italy (MPH external dataset; n = 184 masses; 476 images). A segmentation and classification model was developed using convolutional neural networks and traditional radiomics features. Dice surface coefficient (DICE) was used to measure segmentation performance and area under the ROC curve (AUC), F1-score and recall for classification performance. The ICH and MPH datasets had a median age of 45 (IQR 35-60) and 48 (IQR 38-57) years old and consisted of 23.1% and 31.5% malignant cases, respectively. The best segmentation model achieved a DICE score of 0.85 ± 0.01, 0.88 ± 0.01 and 0.85 ± 0.01 in the ICH training, ICH validation and MPH test sets. The best classification model achieved a recall of 1.00 and F1-score of 0.88 (AUC:0.93), 0.94 (AUC:0.89) and 0.83 (AUC:0.90) in the ICH training, ICH validation and MPH test sets, respectively. We have developed an end-to-end radiomics-based model capable of adnexal mass segmentation and classification, with a comparable predictive performance (AUC 0.90) to the published performance of expert subjective assessment (gold standard), and current risk models. Further prospective evaluation of the classification performance of this ML model against existing methods is required.

Journal article

Boubnovski Martell M, Linton-Reid K, Chen M, Hindocha S, Moreno P, Alvarez-Benito M, Salvatierra A, Lee R, Posma J, Calzado M, Aboagye Eet al., 2024, Deep representation learning of tissue metabolome and computed tomography annotates NSCLC classification and prognosis, npj Precision Oncology, Vol: 8, ISSN: 2397-768X

The rich chemical information from tissue metabolomics provides a powerful means to elaborate tissue physiology or tumor characteristics at cellular and tumor microenvironment levels. However, the process of obtaining such information requires invasive biopsies, is costly, and can delay clinical patient management. Conversely, computed tomography (CT) is a clinical standard of care but does not intuitively harbor histological or prognostic information. Furthermore, the ability to embed metabolome information into CT to subsequently use the learned representation for classification or prognosis has yet to be described. This study develops a deep learning-based framework -- tissue-metabolomic-radiomic-CT (TMR-CT) by combining 48 paired CT images and tumor/normal tissue metabolite intensities to generate ten image embeddings to infer metabolite-derived representation from CT alone. In clinical NSCLC settings, we ascertain whether TMR-CT results in an enhanced feature generation model solving histology classification/prognosis tasks in an unseen international CT dataset of 742 patients. TMR-CT non-invasively determines histological classes - adenocarcinoma/squamous cell carcinoma with an F1-score = 0.78 and further asserts patients’ prognosis with a c-index = 0.72, surpassing the performance of radiomics models and deep learning on single modality CT feature extraction. Additionally, our work shows the potential to generate informative biology-inspired CT-led features to explore connections between hard-to-obtain tissue metabolic profiles and routine lesion-derived image data.

Journal article

Kim H, Kim C, Sohn J, Beck T, Rei M, Kim S, Simpson TI, Posma JM, Lain A, Sung M, Kang Jet al., 2023, KU AIGEN ICL EDI@BC8 Track 3: Advancing phenotype named entity recognition and normalization for dysmorphology physical examination reports, AMIA 2023 Annual Symposium, Publisher: Zenodo, Pages: 1-5

The objective of BioCreative8 Track 3 is to extract phenotypic key medical findings embedded within EHR texts and subsequently normalize these findings to their Human Phenotype Ontology (HPO) terms. However, the presence of diverse surface forms in phenotypic findings makes it challenging to accurately normalize them to the correct HPO terms. To address this challenge, we explored various models for named entity recognition and implemented data augmentation techniques such as synonym marginalization to enhance the normalization step. Our pipeline resulted in an exact extraction and normalization F1 score 2.6% higher than the mean score of all submissions received in response to the challenge. Furthermore, in terms of the normalization F1 score, our approach surpassed the average performance by 1.9%. These findings contribute to the advancement of automated medical data extraction and normalization techniques, showcasing potential pathways for future research and application in the biomedical domain.

Conference paper

Rundle M, Fiamoncini J, Thomas EL, Wopereis S, Afman LA, Brennan L, Drevon CA, Gundersen TE, Daniel H, Perez IG, Posma JM, Ivanova DG, Bell JD, van Ommen B, Frost Get al., 2023, Diet-induced Weight Loss and Phenotypic Flexibility Among Healthy Overweight Adults: A Randomized Trial, AMERICAN JOURNAL OF CLINICAL NUTRITION, Vol: 118, Pages: 591-604, ISSN: 0002-9165

Journal article

Alexander J, Posma J, Scott A, Poynter L, Mason S, Herendi L, Roberts L, McDonald J, Cameron S, Darzi A, Goldin R, Takats Z, Marchesi J, Teare J, Kinross Jet al., 2023, Pathobionts in the tumour microbiota predict survival following resection for colorectal cancer, Microbiome, Vol: 11, Pages: 1-14, ISSN: 2049-2618

Background and aimsThe gut microbiota is implicated in the pathogenesis of colorectal cancer (CRC). We aimed to map the CRC mucosal microbiota and metabolome and define the influence of the tumoral microbiota on oncological outcomes.MethodsA multicentre, prospective observational study was conducted of CRC patients undergoing primary surgical resection in the UK (n = 74) and Czech Republic (n = 61). Analysis was performed using metataxonomics, ultra-performance liquid chromatography-mass spectrometry (UPLC-MS), targeted bacterial qPCR and tumour exome sequencing. Hierarchical clustering accounting for clinical and oncological covariates was performed to identify clusters of bacteria and metabolites linked to CRC. Cox proportional hazards regression was used to ascertain clusters associated with disease-free survival over median follow-up of 50 months.ResultsThirteen mucosal microbiota clusters were identified, of which five were significantly different between tumour and paired normal mucosa. Cluster 7, containing the pathobionts Fusobacterium nucleatum and Granulicatella adiacens, was strongly associated with CRC (PFDR = 0.0002). Additionally, tumoral dominance of cluster 7 independently predicted favourable disease-free survival (adjusted p = 0.031). Cluster 1, containing Faecalibacterium prausnitzii and Ruminococcus gnavus, was negatively associated with cancer (PFDR = 0.0009), and abundance was independently predictive of worse disease-free survival (adjusted p = 0.0009). UPLC-MS analysis revealed two major metabolic (Met) clusters. Met 1, composed of medium chain (MCFA), long-chain (LCFA) and very long-chain (VLCFA) fatty acid species, ceramides and lysophospholipids, was negatively associated with CRC (PFDR = 2.61 × 10−11); Met 2, composed of phosphatidylcholine species, nucleosides and amino acids, was strongly associated with CRC (PFDR&

Journal article

Alexander JL, Posma JM, Scott A, Poynter L, Mason S, Doria L, Roberts L, McDonald JA, Cameron S, Hughes D, Liska V, Susova S, Soucek P, Horneffer V, Gomez-Romero M, Herendi L, Lewis M, Hoyles L, Woolston A, Cunningham D, Darzi A, Gerlinger M, Goldin R, Takats Z, Marchesi J, Teare JP, Kinross JMet al., 2023, NETWORKS OF PATHOBIONTS IN THE TUMOUR MUCOSAL NICHE PREDICT SURVIVAL FOLLOWING COLORECTAL CANCER RESECTION, Digestive Disease Week (DDW), Publisher: W B SAUNDERS CO-ELSEVIER INC, Pages: S469-S470, ISSN: 0016-5085

Conference paper

Garcia-Perez I, Posma JM, Chambers ES, Mathers JC, Draper J, Beckmann M, Nicholson JK, Holmes E, Frost Get al., 2023, Dietary metabotype modelling predicts individual responses to dietary interventions (Vol 1, pg 355, 2020) (Retraction of Vol 1, Pg 355, 2020), NATURE FOOD, Vol: 4, Pages: 269-269

Journal article

Posma JM, Perez IG, Karaman I, Gao H, Chan Q, Daviglus M, Van Horn L, Holmes E, Nicholson JK, Elliott Pet al., 2023, Host genomic influence on the gut microbial metabolite-blood pressure relationship, 29th Scientific Meeting of the International Society of Hypertension (Hypertension Kyoto 2022), Publisher: Lippincott, Williams & Wilkins, Pages: E240-E240, ISSN: 0263-6352

Conference paper

Brignardello J, Fountana S, Posma JM, Chambers ES, Nicholson JK, Wist J, Frost G, Garcia-Perez I, Holmes Eet al., 2022, Characterization of diet-dependent temporal changes in circulating short-chain fatty acid concentrations: a randomized crossover dietary trial, The American Journal of Clinical Nutrition, Vol: 116, Pages: 1368-1378, ISSN: 0002-9165

Background: Production of Short-chain fatty acids (SCFAs) from food is a complex and dynamic saccharolytic fermentation process mediated by both human and gut microbial factors. SCFA production and knowledge of the relationship between SCFA profiles and dietary patterns is lacking. Objective: Temporal changes in SCFA levels in response to two contrasting diets were investigated using a novel GC-MS method.Design: Samples were obtained from a randomized, controlled, crossover trial designed to characterize the metabolic response to four diets. Participants (n=19) undertook these diets during an inpatient stay (of 72-h). Serum samples were collected 2-h after breakfast (AB), lunch (AL) and dinner (AD) on day 3 and a fasting sample (FA) was obtained on day 4. 24-h urine samples were collected on day 3. In this sub-study, samples from the two extreme diets representing a diet with high adherence to WHO healthy eating recommendations and a typical Western diet were analyzed using a bespoke GC-MS method developed to detect and quantify 10 SCFAs and precursors in serum and urine samples. Results: Considerable inter-individual variation in serum SCFA concentrations was observed across all time points and temporal fluctuations were observed for both diets. Although the sample collection timing exerted a greater magnitude of effect on circulating SCFA concentrations, the unhealthy diet was associated with a lower concentration of acetic acid (FA: coefficient=-17.0; standard error (SE)=5.8; p-trend=0.00615), 2-methylbutyric acid (AL: coefficient=-0.1; SE=0.028; p-trend=4.13x10-4 and AD: coefficient =-0.1; SE:=0.028; p-trend=2.28x10-3) and 2-hydroxybutyric acid (FA: coefficient=-15.8; standard error=5.11; p-trend: 4.09x10-3). In contrast lactic acid was significantly higher in the unhealthy diet (AL: coefficient=750.2; standard error=315.2; p-trend=0.024 and AD: coefficient=1219.3; standard error=322.6; p-trend: 8.28x10-4). Conclusion: The GC-MS method allowed robust mapping of

Journal article

Penney N, Yeung K, Garcia Perez I, Posma J, Kopytek A, Garratt B, Ashrafian H, Frost G, Marchesi J, Purkayastha S, Hoyles L, Darzi A, Holmes Eet al., 2022, Multi-omic phenotyping reveals host-microbe responses to bariatric surgery, glycaemic control and obesity, communications medicine, Vol: 2, Pages: 1-18, ISSN: 2730-664X

Background: Resolution of type 2 diabetes (T2D) is common following bariatric surgery, particularly Roux-en-Y gastric bypass. However, the underlying mechanisms have not been fully elucidated.Methods: To address this we compare the integrated serum, urine and faecal metabolic profiles of participants with obesity +/- T2D (n=80, T2D=42) with participants who underwent Roux-en-Y gastric bypass or sleeve gastrectomy (pre and 3-months post-surgery; n=27), taking diet into account. We co-model these data with shotgun metagenomic profiles of the gut microbiota to provide a comprehensive atlas of host-gut microbe responses to bariatric surgery, weight-loss and glycaemic control at the systems level.Results: Here we show that bariatric surgery reverses several disrupted pathways characteristic of T2D. The differential metabolite set representative of bariatric surgery overlaps with both diabetes (19.3% commonality) and body mass index (18.6% commonality). However, the percentage overlap between diabetes and body mass index is minimal (4.0% commonality), consistent with weight-independent mechanisms of T2D resolution. The gut microbiota is more strongly correlated to body mass index than T2D, although we identify some pathways such as amino acid metabolism that correlate with changes to the gut microbiota and which influence glycaemic control.Conclusion: We identify multi-omic signatures associated with responses to surgery, body mass index, and glycaemic control. Improved understanding of gut microbiota - host co-metabolism may lead to novel therapies for weight-loss or diabetes. However, further experiments are required to provide mechanistic insight into the role of the gut microbiota in host metabolism and establish proof of causality.

Journal article

Boubnovski MM, Chen M, Linton-Reid K, Posma JM, Copley SJ, Aboagye EOet al., 2022, Development of a multi-task learning V-Net for pulmonary lobar segmentation on CT and application to diseased lungs, Clinical Radiology, Vol: 77, Pages: e620-e627, ISSN: 0009-9260

AIMTo develop a multi-task learning (MTL) V-Net for pulmonary lobar segmentation on computed tomography (CT) and application to diseased lungs.MATERIALS AND METHODSThe described methodology utilises tracheobronchial tree information to enhance segmentation accuracy through the algorithm's spatial familiarity to define lobar extent more accurately. The method undertakes parallel segmentation of lobes and auxiliary tissues simultaneously by employing MTL in conjunction with V-Net-attention, a popular convolutional neural network in the imaging realm. Its performance was validated by an external dataset of patients with four distinct lung conditions: severe lung cancer, COVID-19 pneumonitis, collapsed lungs, and chronic obstructive pulmonary disease (COPD), even though the training data included none of these cases.RESULTSThe following Dice scores were achieved on a per-segment basis: normal lungs 0.97, COPD 0.94, lung cancer 0.94, COVID-19 pneumonitis 0.94, and collapsed lung 0.92, all at p<0.05.CONCLUSIONDespite severe abnormalities, the model provided good performance at segmenting lobes, demonstrating the benefit of tissue learning. The proposed model is poised for adoption in the clinical setting as a robust tool for radiologists and researchers to define the lobar distribution of lung diseases and aid in disease treatment planning.

Journal article

Chan Q, Wren G, Lau CH, Ebbels T, Gibson R, Loo RL, Aljuraiban G, Posma J, Dyer A, Steffen L, Rodriguez B, Appel L, Daviglus M, Elliott P, Stamler J, Holmes E, Van Horn Let al., 2022, Blood pressure interactions with the DASH dietary pattern, sodium, and potassium: The International Study of Macro-/Micronutrients and Blood Pressure (INTERMAP), The American Journal of Clinical Nutrition, Vol: 116, Pages: 216-229, ISSN: 1938-3207

BackgroundAdherence to the Dietary Approaches to Stop Hypertension (DASH) diet enhances potassium intake and reduces sodium intake and blood pressure (BP), but the underlying metabolic pathways are unclear.ObjectiveAmong free-living populations, delineate metabolic signatures associated with the DASH diet adherence, 24-hr urinary sodium and potassium excretions and the potential metabolic pathways involved.Design24-hr urinary metabolic profiling by proton nuclear magnetic resonance spectroscopy was used to characterize the metabolic signatures associated with the DASH dietary pattern score (DASH score) and 24-hr excretion of sodium and potassium among participants in the United States (n=2,164) and United Kingdom (n= 496) enrolled in the International Study of Macro- and Micronutrients and Blood Pressure (INTERMAP). Multiple linear regression and cross-tabulation analyses were used to investigate the DASH-BP relation and its modulation by sodium and potassium. Potential pathways associated with DASH adherence, sodium and potassium excretion, and BP were identified using mediation analyses and metabolic reaction networks.ResultsAdherence to DASH diet was associated with urinary potassium excretion (correlation coefficient, r = 0.42, P<0.0001). In multivariable regression analyses, a five-point higher DASH score (range 7 to 35) was associated with a lower systolic BP by 1.35 mmHg (95% confidence interval: -1.95, -0.80, P=1.2 × 10−5); control of the model for potassium but not sodium attenuated the DASH-BP relation. Two common metabolites (hippurate and citrate) mediated the potassium-BP and DASH-BP relationships, while five metabolites (succinate, alanine, S-methyl cysteine sulfoxide, 4-hydroxyhippurate, phenylacetylglutamine) were found specific to the DASH-BP relation.ConclusionsGreater adherence to DASH diet is associated with lower BP and higher potassium intake across levels of sodium intake. The DASH diet recommends greater intake of fruits, veget

Journal article

Mujagic Z, Kasapi M, Jonkers DMAE, Garcia Perez I, Vork L, Weerts ZZRM, Serrano Contreras JI, Zhernakova A, Kurilshikov A, Scotcher J, Holmes E, Wijmenga C, Keszthelyi D, Nicholson J, Posma JM, Masclee AAMet al., 2022, Integrated fecal microbiome–metabolome signatures reflect stress and serotonin metabolism in irritable bowel syndrome, Gut Microbes, Vol: 14, Pages: 1-20, ISSN: 1949-0976

To gain insight into the complex microbiome-gut-brain axis in irritable bowel syndrome (IBS) several modalities of biological and clinical data must be combined. We aimed to identify profiles of faecal microbiota and metabolites associated with IBS and to delineate specific phenotypes of IBS that represent potential pathophysiological mechanisms. Faecal metabolites were measured using proton Nuclear Magnetic Resonance (1H-NMR) spectroscopy and gut microbiome using Shotgun Metagenomic Sequencing (MGS) in a combined dataset of 142 IBS patients and 120 healthy controls (HC) with extensive clinical, biological and phenotype information. Data were analysed using support vector classification and regression and kernel t-SNE. Microbiome and metabolome profiles could distinguish IBS and HC with an area-under-the-receiver-operator-curve (AUC) of 77.3% and 79.5%, respectively, but this could be improved by combining microbiota and metabolites to 83.6%. No significant differences in predictive ability of the microbiome-metabolome data were observed between the three classical, stool pattern-based, IBS subtypes. However, unsupervised clustering showed distinct subsets of IBS patients based on faecal microbiome-metabolome data. These clusters could be related plasma levels of serotonin and its metabolite 5-hydroxyindoleacetate, effects of psychological stress on gastrointestinal symptoms, onset of IBS after stressful events, medical history of previous abdominal surgery, dietary caloric intake and IBS symptom duration. Furthermore, pathways in metabolic reaction networks were integrated with microbiota data, that reflect the host-microbiome interactions in IBS. The identified microbiome-metabolome signatures for IBS, associated with altered serotonin metabolism and unfavourable stress-response related to gastrointestinal symptoms, support the microbiota-gut-brain link in the pathogenesis of IBS.

Journal article

Yeung C, Beck T, Posma JM, 2022, MetaboListem and TABoLiSTM: two deep learning algorithms for metabolite named entity recognition, Metabolites, Vol: 12, Pages: 1-23, ISSN: 2218-1989

Reviewing the metabolomics literature is becoming increasingly difficult because of the rapid expansion of relevant journal literature. Text-mining technologies are therefore needed to facilitate more efficient literature reviews. Here we contribute a standardised corpus of full-text publications from metabolomics studies and describe the development of two metabolite named entity recognition (NER) methods. These methods are based on Bidirectional Long Short-Term Memory (BiLSTM) networks and each incorporate different transfer learning techniques (for tokenisation and word embedding). Our first model (MetaboListem) follows prior methodology using GloVe word embeddings. Our second model exploits BERT and BioBERT for embedding and is named TABoLiSTM (Transformer-Affixed BiLSTM). The methods are trained on a novel corpus annotated using rule-based methods, and evaluated on manually annotated metabolomics articles. MetaboListem (F1 score 0.890, precision 0.892, recall 0.888) and TABoLiSTM (BioBERT version: F1 score 0.909, precision 0.926, recall 0.893) have achieved state-of-the-art performance on metabolite NER. A training corpus with full-text sentences from $>$1,000 full-text Open Access metabolomics publications with 105,335 annotated metabolites was created, as well as a manually annotated test corpus (19,138 annotations). This work demonstrates that deep learning algorithms are capable of identifying metabolite names accurately and efficiently in text. The proposed corpus and NER algorithms can be used for metabolomics text-mining tasks such as information retrieval, document classification and literature-based discovery. They are available from https://github.com/omicsNLP/MetaboliteNER.

Journal article

Yeung C, Beck T, Posma JM, 2022, MetaboListem and TABoLiSTM: two deep learning Algorithms for metabolite named entity recognition, Publisher: bioRxiv

Reviewing the metabolomics literature is becoming increasingly difficult because of the rapid expansion of relevant journal literature. Text-mining technologies are therefore needed to facilitate more efficient literature review. Here we contribute a standardised corpus of full-text publications from metabolomics studies and describe the development of two new metabolite named entity recognition (NER) methods. We introduce two deep learning methods for metabolite NER based on Bidirectional Long Short-Term Memory (BiLSTM) networks incorporating different transfer learning techniques. Our first model (MetaboListem) follows prior methodology using GloVe word embeddings. Our second model exploits BERT and BioBERT for embedding and is named TABoLiSTM (Transformer-Affixed BiLSTM). The methods are trained on a novel corpus annotated using rule-based methods, and evaluated on manually annotated metabolomics articles. MetaboListem (F1 score 0.890, precision 0.892, recall 0.888) and TABoLiSTM (BioBERT version: F1 score 0.909, precision 0.926, recall 0.893) have achieved state-of-the-art performance on metabolite NER. A corpus with $>$1,200 full-text Open Access metabolomics publications and $>$116,000 annotated metabolites was created. This work demonstrates that deep learning algorithms are capable of identifying metabolite names accurately and efficiently in text. The proposed corpus and NER algorithms can be used for metabolomics text-mining tasks such as information retrieval, document classification and literature-based discovery. The corpus and NER algorithms are freely available with detailed instructions from Github at https://github.com/omicsNLP/MetaboliteNER.

Working paper

Beck T, Shorter T, Hu S, Li Z, Sun S, Popovici C, McQuibban NAR, Makraduli F, Yeung C, Rowlands T, Posma JMet al., 2022, Auto-CORPus: a natural language processing tool for standardising and reusing biomedical literature, Frontiers in Digital Health, Vol: 4, ISSN: 2673-253X

To analyse large corpora using machine learning and other Natural Language Processing (NLP) algorithms, the corpora need to be standardised. The BioC format is a community-driven simple data structure for sharing text and annotations, however there is limited access to biomedical literature in BioC format and a lack of bioinformatics tools to convert online publication HTML formats to BioC. We present Auto-CORPus (Automated pipeline for Consistent Outputs from Research Publications), a novel NLP tool for the standardisation and conversion of publication HTML and table image files to three convenient machine-interpretable outputs to support biomedical text analytics. Firstly, Auto-CORPus can be configured to convert HTML from various publication sources to BioC. To standardise the description of heterogenous publication sections, the Information Artifact Ontology is used to annotate each section within the BioC output. Secondly, Auto-CORPus transforms publication tables to a JSON format to store, exchange and annotate table data between text analytics systems. The BioC specification does not include a data structure for representing publication table data, so we present a JSON format for sharing table content and metadata. Inline tables within full-text HTML files and linked tables within separate HTML files are processed and converted to machine-interpretable table JSON format. Finally, Auto-CORPus extracts abbreviations declared within publication text and provides an abbreviations JSON output that relates an abbreviation with the full definition. This abbreviation collection supports text mining tasks such as named entity recognition by including abbreviations unique to individual publications that are not contained within standard bio-ontologies and dictionaries. The Auto-CORPus package is freely available with detailed instructions from GitHub at https://github.com/omicsNLP/Auto-CORPus.

Journal article

Abbott K, Posma JM, Garcia Perez I, Udeh-Momoh C, Ahmadi-Abhari S, Middleton L, Frost Get al., 2022, Evidence-Based Tools for Dietary Assessments in Nutrition Epidemiology Studies for Dementia Prevention, The journal of prevention of Alzheimer's disease, ISSN: 2274-5807

Increasing evidence proposes diet as a notable modifiable factor and viable target for the reduction of Alzheimer’s Disease risk and age-related cognitive decline. However, assessment of dietary exposures is challenged by dietary capture methods that are prone to misreporting and measurement errors. The utility of -omics technologies for the evaluation of dietary exposures has the potential to improve reliability and offer new insights to pre-disease indicators and preventive targets in cognitive aging and dementia. In this review, we present a focused overview of metabolomics as a validation tool and framework for investigating the immediate or cumulative effects of diet on cognitive health.

Journal article

Penney N, Yeung D, Garcia-Perez I, POSMA J, Kopytek A, Garratt B, Ashrafian H, Frost G, Marchesi J, Purkayastha S, Hoyles L, Darzi A, Holmes Eet al., 2021, Longitudinal Multi-omic Phenotyping Reveals Host-microbe Responses to Bariatric Surgery, Glycaemic Control and Obesity

<jats:title>Abstract</jats:title> <jats:p>Resolution of type-2 diabetes (T2D) is common following bariatric surgery, particularly Roux-en-Y gastric bypass (RYGB). However, the underlying mechanisms have not been fully elucidated. To address this we compared the integrated serum, urine and faecal metabolic profiles of obese participants with and without T2D (n=81, T2D=42) with participants who underwent RYGB or sleeve gastrectomy (pre and 3-months post-surgery; n=27), taking diet into account. We co-modelled these data with shotgun metagenomic profiles of the gut microbiota to provide a comprehensive atlas of host-gut microbe responses to bariatric surgery, weight-loss and glycaemic control at the systems level. Bariatric surgery reversed a number of disrupted pathways characteristic of T2D. The differential metabolite set representative of bariatric surgery overlapped with both diabetes (19.3% commonality) and BMI (18.6% commonality). However, the percentage overlap between diabetes and BMI was minimal (4.0% commonality), consistent with weight-independent mechanisms of T2D resolution. The gut microbiota was more strongly correlated to BMI than T2D, although we identified some pathways such as amino acid metabolism that correlated with changes to the gut microbiota and which influence glycaemic control. Improved understanding of GM-host co-metabolism may lead to novel therapies for weight-loss or diabetes.</jats:p>

Journal article

Wu Y, Posma JM, Holmes E, Chambers E, Frost G, Garcia Perez Iet al., 2021, Odd chain fatty acids are not robust biomarkers for dietary intake of fiber, Molecular Nutrition and Food Research, Vol: 65, Pages: 1-8, ISSN: 1613-4125

Prior investigation has suggested a positive association between increased colonic propionate production and circulating odd-chain fatty acids [(OCFAs; pentadecanoic acid (C15:0), heptadecanoic acid (C17:0)]. As the major source of propionate in humans is the microbial fermentation of dietary fiber, OCFAs have been proposed as candidate biomarkers of dietary fiber. The objective of this study is to critically assess the plausibility, robustness, reliability, dose-response, time-response aspects of OCFAs as potential biomarkers of fermentable fibers in two independent studies using a validated analytical method. OCFAs were first assessed in a fiber supplementation study, where 21 participants received 10g dietary fiber supplementation for 7 days with blood samples collected on the final day at a 420 minute study visit. OCFAs were then assessed in a highly controlled inpatient setting, which 19 participants consumed a high fiber (45.1g/day) and a low fiber diet (13.6g/day) for 4 days. Collectively in both studies, dietary intakes of fiber as fiber supplementations or having consumed a high fiber diet did not increase circulating levels of OCFAs. The dose and temporal relations were not observed. Current study has generated new insight on the utility of OCFAs as fiber biomarkers and highlighted the importance of critical assessment of candidate dietary biomarkers before application.

Journal article

Posma JM, Garcia-Perez I, Frost G, Aljuraiban GS, Chan Q, Van Horn L, Daviglus M, Stamler J, Holmes E, Elliott P, Nicholson JKet al., 2021, Nutriome-metabolome relationships provide insights into dietary intake and metabolism (vol 1, pg 426, 2020), NATURE FOOD, Vol: 2, Pages: 541-542

Journal article

Li Z, Makraduli F, Yeung C, McQuibban NAR, Popovici C, Sun S, Hu Y, Rowlands T, Posma JM, Beck Tet al., 2021, Auto-CORPus: Automated and Consistent Outputs from Research Publications, UK healthcare text analytics conference 2021

The availability of improved natural language processing (NLP) algorithms and models enable researchers to analyse larger corpora using open source tools. Text mining of biomedical literature is one area for which NLP has been used in recent years with large untapped potential. However, to generate corpora that can be analysed using machine learning NLP algorithms, these need to be standardized. Summarizing data from literature to be stored into databases typically requires manual curation, especially for extracting data from result tables.We present an automated pipeline that cleans HTML files from biomedical literature. The outputs are JSON files that contains the text for each section, table data in machine-readable format and lists the phenotypes, assays, chemical compounds, SNPs, P-values and abbreviations found in the article. We analysed a total of 2,441 Open Access articles from PubMed Central, from both Genome-Wide and Metabolome-Wide Association Studies, and developed a model to standardize the section headers based on the Information Artifact Ontology. Extraction of table data was developed on PubMed articles and fine-tuned using the equivalent publisher versions. As part of this work we found evidence of tables being converted to figures by authors as well as publishers. To this end we have developed a pipeline that converts the table from an image back to text while keeping the table structure intact. We have fine-tuned the Tesseract optical character recognition (OCR) algorithm specifically for biomedical table data. We have improved the accuracy of recognising characters in table-images using the original Tesseract algorithm from 53% to 90% when evaluated on 233 tables from 80 publications.In summary, Auto-CORPus can be used to create a corpus for different fields where the section headers are standardised to allow NLP algorithms to be applied to specific paragraphs, rather than only on abstracts or the full text.

Conference paper

Boubnovski Martell M, Chen M, Linton-Reid K, Posma JM, Copley S, Aboagye Eet al., 2021, [pre-print] Development of a Multi-Task Learning V-Net for Pulmonary Lobar Segmentation on Computed Tomography and Application to Diseased Lungs, Publisher: arXiv

Automated lobar segmentation allows regional evaluation of lung disease and is important for diagnosis and therapy planning. Advanced statistical workflows permitting such evaluation is a needed area within respiratory medicine; their adoption remains slow, with poor workflow accuracy. Diseased lung regions often produce high-density zones on CT images, limiting an algorithm's execution to specify damaged lobes due to oblique or lacking fissures. This impact motivated developing an improved machine learning method to segment lung lobes that utilises tracheobronchial tree information to enhance segmentation accuracy through the algorithm's spatial familiarity to define lobar extent more accurately. The method undertakes parallel segmentation of lobes and auxiliary tissues simultaneously by employing multi-task learning (MTL) in conjunction with V-Net-attention, a popular convolutional neural network in the imaging realm. In keeping with the model's adeptness for better generalisation, high performance was retained in an external dataset of patients with four distinct diseases: severe lung cancer, COVID-19 pneumonitis, collapsed lungs and Chronic Obstructive Pulmonary Disease (COPD), even though the training data included none of these cases. The benefit of our external validation test is specifically relevant since our choice includes those patients who have diagnosed lung disease with associated radiological abnormalities. To ensure equal rank is given to all segmentations in the main task we report the following performance (Dice score) on a per-segment basis: normal lungs 0.97, COPD 0.94, lung cancer 0.94, COVID-19 pneumonitis 0.94 and collapsed lung 0.92, all at p<0.05. Even segmenting lobes with large deformations on CT images, the model maintained high accuracy. The approach can be readily adopted in the clinical setting as a robust tool for radiologists.

Working paper

Barker GF, Pechlivanis A, Bello AT, Chrysostomou D, Mullish BH, Marchesi J, Posma JM, Kinross JM, Nicholson J, O'Keefe SJ, Li JVet al., 2021, Aa022 a high-fiber low-fat diet increases fecal levels of lithocholic acid derivative 3-ketocholanic acid, Digestive Disease Week, Publisher: W B SAUNDERS CO-ELSEVIER INC, Pages: S393-S394, ISSN: 0016-5085

Conference paper

Posma JM, Stamler J, Garcia-Perez I, Chan Q, Wijeyesekera A, Daviglus M, Van Horn L, Holmes E, Nicholson J, Elliott Pet al., 2021, Urinary metabolic phenotype of blood pressure, 19TH INTERNATIONAL SHR SYMPOSIUM SHR, Publisher: Lippincott, Williams & Wilkins, Pages: E70-E70, ISSN: 0263-6352

Objective: Metabolic phenotyping (metabolomics) captures systems-level information on metabolic processes by simultaneously measuring hundreds of metabolites using spectroscopic techniques. Concentrations of these metabolites are affected by genetic (host, microbiome), environmental and dietary factors and may provide insights into biochemical pathways underlying raised blood pressure (BP) in populations.Design and method: Two separate, timed 24hr urine specimens were obtained from 2,031 women and men, aged 40–59, from 8 USA population samples in the INTERMAP Study. Proton Nuclear Magnetic Resonance (1H NMR) was used to characterize a urinary metabolic signature; this was unaffected by diurnal variability and sampling time as it captures end-products of metabolism over a 24hr period. Demographic, population, medical, lifestyle and anthropometric factors were accounted for in regression models to define a urinary metabolic phenotype associated with BP.Results: 29 structurally identified urinary metabolites covaried with systolic BP (SBP), after adjustment for demographic variables, and 18 metabolites with diastolic BP (DBP), with 16 metabolites overlapping between SBP and DBP. These included metabolites related to energy metabolism, renal function, diet and gut microbiota. After adjustment for medical and lifestyle covariates, 22/14 metabolites remained associated with SBP/DBP. Joint covariate-metabolite penalized regression models identified Body Mass Index, age and family history as most important contributors, with 14 metabolites, including gut microbial co-metabolites, also included in the model. Metabolites were mapped in a symbiotic metabolic reaction network, that includes reactions mediated by 3,344 commensal gut microbial species, to highlight affected pathways (Figure). Significant single nucleotide polymorphisms (SNPs) from genome-wide association studies on cardiometabolic risk factors were mapped to genes in this network. This revealed multiple sub

Conference paper

Jordi M-P, Wellington A, Lubach G, Posma J, Coe C, Swann Jet al., 2021, Gut microbial and metabolic profiling reveal the lingering effects of infantile iron deficiency unless treated with iron, Molecular Nutrition and Food Research, Vol: 65, ISSN: 1613-4125

ScopeIron deficiency (ID) compromises the health of infants worldwide. Although readily treated with iron, concerns remain about the persistence of some effects. Metabolic and gut microbial consequences of infantile ID were investigated in juvenile monkeys after natural recovery (pID) from iron deficiency or post‐treatment with iron dextran and B vitamins (pID+Fe).Methods and ResultsMetabolomic profiling of urine and plasma is conducted with 1H nuclear magnetic resonance (NMR) spectroscopy. Gut microbiota are characterized from rectal swabs by amplicon sequencing of the 16S rRNA gene. Urinary metabolic profiles of pID monkeys significantly differed from pID+Fe and continuously iron‐sufficient controls (IS) with higher maltose and lower amounts of microbial‐derived metabolites. Persistent differences in energy metabolism are apparent from the plasma metabolic phenotypes with greater reliance on anaerobic glycolysis in pID monkeys. Microbial profiling indicated higher abundances of Methanobrevibacter, Lachnobacterium, and Ruminococcus in pID monkeys and any history of ID resulted in a lower Prevotella abundance compared to the IS controls.ConclusionsLingering metabolic and microbial effects are found after natural recovery from ID. These long‐term biochemical derangements are not present in the pID+Fe animals emphasizing the importance of the early detection and treatment of early‐life ID to ameliorate its chronic metabolic effects.

Journal article

Hu Y, Sun S, Rowlands T, Beck T, Posma JMet al., 2021, Auto-CORPus: automated and consistent outputs from research publications, Publisher: bioRxiv

Motivation: The availability of improved natural language processing (NLP) algorithms and models enable researchers to analyse larger corpora using open source tools. Text mining of biomedical literature is one area for which NLP has been used in recent years with large untapped potential. However, in order to generate corpora that can be analyzed using machine learning NLP algorithms, these need to be standardized. Summarizing data from literature to be stored into databases typically requires manual curation, especially for extracting data from result tables. Results: We present here an automated pipeline that cleans HTML files from biomedical literature. The output is a single JSON file that contains the text for each section, table data in machine-readable format and lists of phenotypes and abbreviations found in the article. We analyzed a total of 2,441 Open Access articles from PubMed Central, from both Genome-Wide and Metabolome-Wide Association Studies, and developed a model to standardize the section headers based on the Information Artifact Ontology. Extraction of table data was developed on PubMed articles and fine-tuned using the equivalent publisher versions. Availability: The Auto-CORPus package is freely available with detailed instructions from Github at https://github.com/jmp111/AutoCORPus/.

Working paper

Penney N, Barton W, Posma J, Darzi A, Frost G, Cotter P, Holmes E, Shanahan F, O Sullivan O, Garcia Perez Iet al., 2020, Investigating the role of diet and exercise in gut microbe-hostcometabolism, mSystems, Vol: 5, Pages: 1-16, ISSN: 2379-5077

We investigated the individual and combined effects of diet and physical exercise on metabolism and the gut microbiome to establish how these lifestyle factors influence host-microbiome cometabolism. Urinary and fecal samples were collected from athletes and less active controls. Individuals were further classified according to an objective dietary assessment score of adherence to healthy dietary habits according to WHO guidelines, calculated from their proton nuclear magnetic resonance (1H-NMR) urinary profiles. Subsequent models were generated comparing extremes of dietary habits, exercise, and the combined effect of both. Differences in metabolic phenotypes and gut microbiome profiles between the two groups were assessed. Each of the models pertaining to diet healthiness, physical exercise, or a combination of both displayed a metabolic and functional microbial signature, with a significant proportion of the metabolites identified as discriminating between the various pairwise comparisons resulting from gut microbe-host cometabolism. Microbial diversity was associated with a combination of high adherence to healthy dietary habits and exercise and was correlated with a distinct array of microbially derived metabolites, including markers of proteolytic activity. Improved control of dietary confounders, through the use of an objective dietary assessment score, has uncovered further insights into the complex, multifactorial relationship between diet, exercise, the gut microbiome, and metabolism. Furthermore, the observation of higher proteolytic activity associated with higher microbial diversity indicates that increased microbial diversity may confer deleterious as well as beneficial effects on the host.

Journal article

Garcia Perez I, Posma JM, Serrano Contreras JI, Boulange C, Chan Q, Frost G, Stamler J, Elliott P, Lindon J, Holmes E, Nicholson Jet al., 2020, Identifying unknown metabolites using NMR-based metabolic profiling techniques, Nature Protocols, Vol: 15, Pages: 2538-2567, ISSN: 1750-2799

Metabolic profiling of biological samples provides important insights into multiple physiological and pathological processes, but is hindered by a lack of automated annotation and standardised methods for structure elucidation of candidate disease biomarkers. Here, we describe a system for identifying molecular species derived from NMR spectroscopy based metabolic phenotyping studies, with detailed info on sample preparation, data acquisition, and data modelling. We provide eight different modular workflows to be followed in a recommended sequential order according to their level of difficulty. This multi-platform system involves the use of statistical spectroscopic tools such as STOCSY, STORM and RED-STORM to identify other signals in the NMR spectra relating to the same molecule. It also utilizes 2D-NMR spectroscopic analysis, separation and pre-concentration techniques, multiple hyphenated analytical platforms and data extraction from existing databases. The complete system, using all eight workflows, would take up to a month, as it includes multidimensional NMR experiments that require prolonged experiment times. However, easier identification cases using fewer steps would take two or three days. This approach to biomarker discovery is efficient, cost-effective and offers increased chemical space coverage of the metabolome, resulting in faster and more accurate assignment of NMR-generated biomarkers arising from metabolic phenotyping studies. Finally, it requires basic understanding of Matlab in order to perform statistical spectroscopic tools and analytical skills to perform Solid Phase Extraction, LC-fraction collection, LC-NMR-MS and 1D and 2D NMR experiments.

Journal article

Eriksen R, Garcia Perez I, Posma JM, Haid M, Sharma S, Prehn C, Thomas LE, Koivula RW, Bizzotto R, Prehn C, Mari A, Giordano GN, Pavo I, Schwenk JM, De Masi F, Tsirigos KD, Brunak S, Viñuela A, Mahajan A, McDonald TJ, Kokkola T, Rutter F, Teare H, Hansen TH, Fernandez J, Jones A, Jennison C, Walker M, McCarthy MI, Pedersen O, Ruetten H, Forgie I, Bell JD, Pearson ER, Franks PW, Adamski J, Holmes E, Frost Get al., 2020, Dietary metabolite profiling brings new insight into the relationship between nutrition and metabolic risk: An IMI DIRECT study, EBioMedicine, Vol: 58, Pages: 1-9, ISSN: 2352-3964

BackgroundDietary advice remains the cornerstone of prevention and management of type 2 diabetes (T2D). However, understanding the efficacy of dietary interventions is confounded by the challenges inherent in assessing free living diet. Here we profiled dietary metabolites to investigate glycaemic deterioration and cardiometabolic risk in people at risk of or living with T2D.MethodsWe analysed data from plasma collected at baseline and 18-month follow-up in individuals from the Innovative Medicines Initiative (IMI) Diabetes Research on Patient Stratification (DIRECT) cohort 1 n = 403 individuals with normal or impaired glucose regulation (prediabetic) and cohort 2 n = 458 individuals with new onset of T2D. A dietary metabolite profile model (Tpred) was constructed using multivariable regression of 113 plasma metabolites obtained from targeted metabolomics assays. The continuous Tpred score was used to explore the relationships between diet, glycaemic deterioration and cardio-metabolic risk via multiple linear regression models.FindingsA higher Tpred score was associated with healthier diets high in wholegrain (β=3.36 g, 95% CI 0.31, 6.40 and β=2.82 g, 95% CI 0.06, 5.57) and lower energy intake (β=-75.53 kcal, 95% CI -144.71, -2.35 and β=-122.51 kcal, 95% CI -186.56, -38.46), and saturated fat (β=-0.92 g, 95% CI -1.56, -0.28 and β=–0.98 g, 95% CI -1.53, -0.42 g), respectively for cohort 1 and 2. In both cohorts a higher Tpred score was also associated with lower total body adiposity and favourable lipid profiles HDL-cholesterol (β=0.07 mmol/L, 95% CI 0.03, 0.1), (β=0.08 mmol/L, 95% CI 0.04, 0.1), and triglycerides (β=-0.1 mmol/L, 95% CI -0.2, -0.03), (β=-0.2 mmol/L, 95% CI -0.3, -0.09), respectively for cohort 1 and 2. In cohort 2, the Tpred score was negatively associated with liver fat (β=-0.74%, 95% CI -0.67, -0.81), and lower fasting concentrations of HbA1c (β=-0.9 mmol/mol, 95% CI -1.5, -0.1), glu

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00709878&limit=30&person=true