14:00 – 15:00 – Sokratia Georgaka (University of Manchester)
Title: Scalable Gaussian Processes and Joint Factorisation for Pattern Discovery and Clustering in Spatially Structured Biological Count Data
Abstract: Recent advances in molecular biology have enabled precise measurement of gene expression at high resolution and with spatial context, capturing how genes are expressed at different locations within a tissue sample. These datasets, in the form of spatially structured count data, combine high-dimensional gene expression profiles with spatial coordinates and often include accompanying tissue images. While rich in information, they pose significant challenges for statistical modelling due to their large size, discrete nature, and spatial dependencies.
A natural choice for modelling gene expression counts data is the negative binomial distribution, which accounts for both mean-variance relationships and overdispersion. When combined with Gaussian process regression, it becomes possible to flexibly model how gene expression varies across space.
In the first part of the talk, I discuss GPcounts, a spatial inference method based on Gaussian processes with a negative binomial likelihood, designed to identify spatially variable genes. I also introduce a sparse implementation that improves its scalability to large datasets.
In the second part, I present CellPie, a computationally efficient, unsupervised method for discovering and clustering spatial patterns in this type of data. CellPie jointly models spatial gene expression and image-derived features from the accompanying tissue image using joint non-negative matrix factorization. It employs an accelerated hierarchical least squares algorithm that enables application to very large-scale datasets.
I apply GPcounts and CellPie across datasets from healthy and cancerous tissues, identifying coherent spatial clusters that correspond to distinct biological regions. Together, these approaches show how scalable statistical modelling and joint analysis of molecular and imaging data can reveal meaningful patterns of variation in complex, high resolution biological datasets.
Refreshments available between 15:00 – 15:30, Huxley Common Room (HXLY 549)