In this section

Publications

Featured publications

CapTrack

CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training

Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz (2026)

CapTrack is a capability-centric framework for analysing forgetting in LLMs that combines a behavioural taxonomy with an evaluation suite built on established benchmarks and targeted adaptations.

Read the paper

A teaser image depicting the paper contents

GVP-WM

Grounding Generated Videos in Feasible Plans via World Models

Christos Ziakas, Amir Bar, Alessandra Russo (2026)

Grounding Video Plans with World Models (GVP-WM) grounds video-generated plans into feasible action sequences using a pre-trained action-conditioned world model via video-guided latent collocation.

Read the paper

SEM-CTRL

SEM-CTRL: Semantically Controlled Decoding

Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo (2026)

SEM-CTRL is a controlled decoding framework that guides and enforces rich semantic constraints on an LLM at inference time, guaranteeing correctness and enabling small LLMs to outperform frontier models without training.

Read the paper

A teaser screenshot of a graph contained within the paper

AI classification of UK case law

Topic classification of case law using a large language model and a new taxonomy for UK law: AI insights into summary judgment.

Holli Sargeant, Ahmed Izzidien, Felix Steffek (2025)

This paper addresses a critical gap in legal analytics by developing and applying a novel taxonomy for topic classification of summary judgment cases in the United Kingdom.

Read the paper

Agentic systems

Name and link to paper	Authors	About	Year
Grounding Generated Videos in Feasible Plans via World Models	Christos Ziakas, Amir Bar, Alessandra Russo	GVP-WM grounds video-generated plans into feasible action sequences using a pre-trained action-conditioned world model via video-guided latent collocation.	2026

AI safety and evaluations

Name and link to paper	Authors	About	Year
Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts	Christos Ziakas, Nicholas Loo, Nishita Jain, Alessandra Russo	Red-Bandit is a red-teaming framework that adapts online to identify and exploit LLM failure modes under specific attack styles (e.g., manipulation, slang) by selecting among a set of parameter-efficient LoRA experts.	2025
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings	Andrew M Bean, Nabeel Seedat, Shengzhuang Chen, Jonathan Richard Schwarz	This paper presents an item-centric method, selecting benchmark subsets via task properties to enable efficient, interpretable and robust large language model evaluation.	2025
Aligning Language Model Benchmarks with Pairwise Preferences	Marco Gutierrez, Xinyi Leng, Hannah Cyberey, Jonathan Richard Schwarz, Ahmed Alaa, Thomas Hartvigsen	BenchAlign leverages limited performance data and pairwise rankings to produce interpretable, preference-aligned benchmarks that accurately rank unseen models and better predict real-world utility.	2026
CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training	Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz	CapTrack is a capability-centric framework for analysing forgetting in LLMs that combines a behavioural taxonomy with an evaluation suite built on established benchmarks and targeted adaptations.	2026

Model training and reasoning

Name and link to paper	Authors	About	Year
ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization	Shengzhuang Chen, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, Jonathan Richard Schwarz	An approach to data mixture selection for language models that balances computational cost and performance using Bayesian optimization.	2025
Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs	Daniel Furelos-Blanco, Charles Pert, Frederik Kelbel, Alex F. Spies, Alessandra Russo, Michael Dennis	ATLAS tackles a key reinforcement learning challenge by jointly co-designing task and environment curricula, automatically generating solvable yet challenging training pairs that dramatically outperform random sampling, especially in complex settings where viable combinations are rare.	2026
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models	Christos Ziakas, Alessandra Russo	VITA is a test-time adaptation method that improves both generalization and temporal reasoning of VLMs for zero-shot goal-conditioned value function estimation.	2026
SEM-CTRL: Semantically Controlled Decoding	Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo	SEM-CTRL is a controlled decoding framework that guides and enforces rich semantic constraints on an LLM at inference time, guaranteeing correctness and enabling small LLMs to outperform frontier models without training.	2026

Societal impact

Name and link to paper	Authors	About	Year
Legal Innovation: Conversations about Technology, the Legal Profession and Societal Change	Felix Steffek and Mihoko Sumida	Cambridge University Press, XVIII, 243 pp.	2025
Topic Classification of Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment	Holli Sargeant, Ahmed Izzidien, Felix Steffek	This paper addresses a critical gap in legal analytics by developing and applying a novel taxonomy for topic classification of summary judgment cases in the United Kingdom.	2025

Publications

Featured publications

CapTrack

GVP-WM

SEM-CTRL

AI classification of UK case law

Publications by theme