In this section

Basic concepts

If you wish to learn the basics of what UK Med-Bio is involved in from a more technical point of view, here is a quick introduction to some mostly-technical key concepts.

Bioinformatics

Bioinformatics is an interdisciplinary field that develops and applies computational methods to analyse large collections of biological data, such as genetic sequences, cell populations or protein samples, to make new predictions or discover new biology. There are three relatively distinct ways of working on bioinformatics, with many bioinformaticians usually tackling two of them:

1. Data analysis. This could mean going from raw data to cleaning up the data, and then doing some statistical and visual interpretations of the results. It's focused on taking a data set from the raw data to meaningful answers.

2. Bioinformatics software development. This is developing software to do bioinformatics analyses. The software tools are relevant, large and robust enough to publish as independent methods papers and to be used by other scientists in the community of the corresponding field.

3. Modeling. This is modeling simulations generally making equations to represent biological systems.

High performance computing

High performance computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

Although HPC systems are far more complex than a simple desktop computer, researchers with command line knowledge can start using the basics of high performance computing systems as long as they learn a few new concepts. In addition, most HPC systems and services are offered with a lot of support from the people who manage them.

Machine learning and deep learning

Machine learning methods are general‐purpose approaches to learn functional relationships from data without the need to define them a priori. In bioinformatics, their appeal is the ability to derive predictive models without a need for strong assumptions about underlying mechanisms, which are frequently unknown or insufficiently defined. Predictions in genomics, proteomics, metabolomics or sensitivity to compounds all rely on machine learning approaches as a key ingredient.

Deep learning, which describes a class of machine learning algorithms, has recently showed impressive results across a variety of domains. Biology and medicine are data rich, but the data are complex and often ill-understood. Problems of this nature, like patient classification, fundamental biological processes and treatment of patients, may be particularly well-suited to deep learning techniques. Yet, more work is needed to address concerns related to interpretability and how to best model each problem. Furthermore, the limited amount of labeled data for training presents problems in some domains, as can legal and privacy constraints on work with sensitive health records.

Cohorts

Cohort studies are usually forward-looking - that is, they are "prospective" studies, or planned in advance and carried out over a future period of time. In a prospective cohort study, first, a research question is raised, forming a hypothesis about the potential causes of a disease. Then, a group of people, the cohort, is observed over a period of time (often several years), collecting data that may be relevant to the disease. This allows detecting any changes in health in relation to the identified potential risk factors, offering a way to investigate the etiology of a disease.

In UK Med-Bio

UK Med-Bio provides the infrastructure to enable all of the three aspects of bioinformatics, especially data analysis, which is the most computationally demanding one, as well as the development, deployment and execution of machine learning software. Part of the provided infrastructure is embedded within Imperial College’s HPC, while independent high performance servers and resilient storage exists as well, all interconnected where necessary so that researchers can make the most out of them.

The UK Med-Bio bioinformatician helps both with data analysis and software development for streamlined bioinformatics workflows. UK Med-Bio supports projects at the intersection between bioinformatics and various fields of study (e.g. epidemiology), which work on data like genomics and metabolomics from a range of prospective cohorts and UK Biobank.