Artificial intelligence can help spot traces of natural selection


Illustration of DNA

Researchers have used advanced AI and large sets of genomic data to unveil how humans have adapted to recent diseases.

The method could also be applied to new pathogens such as the coronavirus that causes COVID-19, helping identify which gene mutations may be associated with more severe cases of the disease.

This is the first tool to test difference between different types of natural selection, finding signals in the genome that have previously been inaccessible. Dr Matteo Fumagalli

The study, by researchers from Imperial College London, the Middle East Technical University, Turkey, and the Universita degli Studi di Bari Aldo Moro, Italy, is published today in a Special Issue of Molecular Ecology Resources on ‘Machine Learning techniques in Evolution and Ecology’.

Natural selection is the process by which beneficial gene mutations are preserved from generation to generation, until they become dominant in our genomes – the catalogue of all our genes. One thing that can drive natural selection is protection against pathogens.

However, if a population of people moves from one environment to another, or changes its way of life, gene mutations that are protective against one pathogen could make people susceptible to new diseases.

One example of such a new disease is Familial Mediterranean Fever (FMF), an inherited autoimmune disease that has emerged over the past 20,000 years. FMF is prevalent in southern Europe, the Middle East and northern Africa, where around 50 percent of the people in the region today carry a gene mutation that makes them more susceptible to the disease.

Spotting selection

This prevalence of a seemingly detrimental gene mutation could be the result of two different types of natural selection. One option is ‘incomplete sweep’, where the gene mutation for susceptibility is in the process of being removed from the population, but has not yet been completely eradicated. In this case, natural selection is ongoing.

The other option is ‘balancing selection’, where some potentially detrimental gene mutations for one condition are preserved in the population because they confer some protection against a different disease. In this case, the gene for FMF susceptibility has been associated with protection against the bacteria Yersinia pestis, which causes the plague.

To determine which version of natural selection is at play in FMF, the researchers turned to advanced AI, which is particularly good at spotting patterns or recognising images. They trained their algorithm on datasets that have known values to test its ability to spot patterns.

The team then ran their algorithm on the database for the 1000 genomes project, which holds genomic data for 2,504 individuals from 26 populations, including the relevant ones around the Mediterranean. They discovered that the FMF gene mutations are still prevalent as a result of ongoing selection; they haven't reached an equilibrium yet and natural selection is still acting.

Old and new diseases

Lead researcher Dr Matteo Fumagalli, from the Department of Life Sciences at Imperial, said: “This is the first tool to test difference between different types of natural selection, finding signals in the genome that have previously been inaccessible.

“Now we have proven that AI can be used to search genomes for subtle patterns of selection, we can use it to further investigate how humans have both adapted to old diseases, like the plague, and relatively new diseases, like FMF.”

One disease area the team are now investigating is the human relationship with coronaviruses. Humans have been living with coronaviruses for at least 50,000 years, and the greater susceptibility some people have to more severe COVID-19 could be a signal of another balancing selection mechanism.

This study was funded by The Leverhulme Trust, Erasmus+, and Imperial College FoNS European Partners award.


Distinguishing between recent balancing selection and incomplete sweep using deep neural networks’ by Ulas Isildak, Alessandro Stella and Matteo Fumagalli is published in Molecular Ecology Resources.



Hayley Dunning

Hayley Dunning
Communications Division

Click to expand or contract

Contact details

Tel: +44 (0)20 7594 2412

Show all stories by this author


Infectious-diseases, Big-data, Evolution, Global-challenges-Data, Artificial-intelligence, Research, REF, Global-challenges-Health-and-wellbeing, Comms-strategy-Wider-society
See more tags

Leave a comment

Your comment may be published, displaying your name as you provide it, unless you request otherwise. Your contact details will never be published.