IDE Seminar Dr Emma Hodcroft

Pathoplexus: A Community-Driven Solution for Transparent and Equitable Viral Genomic Data Sharing

Sharing viral genomic data is critical for scientific research and informing public health responses. Although, platforms like the International Nucleotide Sequence Database Collaboration (INSDC: NCBI, ENA, DDBOJ) and GISAID facilitate data sharing, they do not fully address specific needs of the pathogen genomics community. Persistent concerns about data misuse, “scooping,” and limitations on data reuse from protected repositories highlight the need for more flexible and equitable data-sharing.

Pathoplexus is a specialized, community-driven viral genomics database designed to balance data utility with autonomy. It combines open-source technologies with transparent governance, allowing data submitters to control how their data is used for up to one year if desired, while promoting rapid data access to researchers and public health officials. All Pathoplexus submissions are automatically uploaded to INSDC: either immediately if fully open or after one year if initially shared as “restricted-use.” Features like SeqSets, which provide citable DOIs to contributors, ensure submitters receive appropriate credit while supporting open, flexible data-sharing terms.

Pathoplexus leverages on Loculus, an open-source tool for managing viral sequence databases which can be replicated by laboratories or organizations in setting up their own databases. The platform’s web interface and API ensure accessibility for both interactive and programmable analyses. Pathoplexus is a non-profit association governed by an international Executive Board and includes members from 10 countries across 5 continents, reflecting a commitment to equity and diverse public health needs.

Though Pathoplexus launched only recently, it has already made an impact in the pathogen sharing community, and as of mid-Nov 2025, contains over 8,900 directly-submitted sequences. Pathoplexus hosts the first available Ebola sequence collected during the February 2025 outbreak in Uganda and the September 2025 outbreak in DRC, and contains over 900 Restricted-Use sequences, i.e. sequences that may not have been shared prior to publication without this protection. We aim to actively expand the number of supported pathogens through community feedback and engagement.

In parallel, Pathoplexus is working to broaden support to include raw read data, enhancing viral evolution insights, and implement a federated network of nodes to foster decentralized sequence sharing. This will empower regions to maintain data ownership while contributing to a resilient, global system.

In this talk, I will introduce Pathoplexus and highlight its potential to enhance data sharing through collaboration and transparency for long-term public health impact.

 

Location: White City, School of Public Health Building
Room: SPH 202 – Seminar Room
In-person event. If you wish to join online, please register here

Getting here