Imperial College London

Dr Ovidiu Șerban

Faculty of EngineeringDepartment of Computing

Research Fellow in Intelligent Data Processing and Curation
 
 
 
//

Contact

 

o.serban Website

 
 
//

Location

 

DSI Main OfficeWilliam Penney LaboratorySouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

8 results found

Ong C, Sun J, Serban O, Guo Yet al., 2023, TKGQA dataset: using question answering to guide and validate the evolution of temporal knowledge graph, Data, Vol: 8, Pages: 1-14, ISSN: 2306-5729

Temporal knowledge graphs can be used to represent the current state of the world and, as daily events happen, the need to update the temporal knowledge graph, in order to stay consistent with the state of the world, becomes very important. However, there is currently no reliable method to accurately validate the update and evolution of knowledge graphs. There has been a recent development in text summarisation, whereby question answering is used to both guide and fact-check summarisation quality. The exact process can be applied to the temporal knowledge graph update process. To the best of our knowledge, there is currently no dataset that connects temporal knowledge graphs with documents with question–answer pairs. In this paper, we proposed the TKGQA dataset, consisting of over 5000 financial news documents related to M&A. Each document has extracted facts, question–answer pairs, and before and after temporal knowledge graphs, to highlight the state of temporal knowledge and any changes caused by the facts extracted from the document. As we parse through each document, we use question–answering to check and guide the update process of the temporal knowledge graph.

Journal article

Zhang W, Serban O, Sun J, Guo Yet al., 2023, IPPT4KRL: Iterative Post-Processing Transfer for Knowledge Representation Learning, Machine Learning and Knowledge Extraction, Vol: 5, Pages: 43-58, ISSN: 2504-4990

Knowledge Graphs (KGs), a structural way to model human knowledge, have been a critical component of many artificial intelligence applications. Many KG-based tasks are built using knowledge representation learning, which embeds KG entities and relations into a low-dimensional semantic space. However, the quality of representation learning is often limited by the heterogeneity and sparsity of real-world KGs. Multi-KG representation learning, which utilizes KGs from different sources collaboratively, presents one promising solution. In this paper, we propose a simple, but effective iterative method that post-processes pre-trained knowledge graph embedding (IPPT4KRL) on individual KGs to maximize the knowledge transfer from another KG when a small portion of alignment information is introduced. Specifically, additional triples are iteratively included in the post-processing based on their adjacencies to the cross-KG alignments to refine the pre-trained embedding space of individual KGs. We also provide the benchmarking results of existing multi-KG representation learning methods on several generated and well-known datasets. The empirical results of the link prediction task on these datasets show that the proposed IPPT4KRL method achieved comparable and even superior results when compared against more complex methods in multi-KG representation learning.

Journal article

Hilman D, Serban O, 2022, A unified Link Prediction architecture applied on a novel heterogenous Knowledge Base, Knowledge-Based Systems, Vol: 241, Pages: 1-17, ISSN: 0950-7051

Link Prediction (LP) aims at addressing incompleteness of Knowledge Graph (KG). The goal of LP is to capture the distribution of entities and relations present in a KG and utilise these to predict probability of missing information. State-of-the-art LP approaches rely on latent feature models for this purpose. The research focus has predominantly been on application of LP to triple based datasets (e.g. Freebase, YAGO). However, with growing adoption of KGs, it is common to see more heterogeneous property graphs being used, examples of common properties are temporal and weight data. The contributions of the following work are two fold. First, we introduce a novel framework which is the first to provide support for latent feature model LP on heterogeneous Knowledge Bases (KBs). Second, we utilise a novel KB — Refinitiv Knowledge Graph, to produce a heterogeneous dataset with which capabilities of the framework are examined.

Journal article

Vaghela U, Rabinowicz S, Bratsos P, Martin G, Fritzilas E, Markar S, Purkayastha S, Stringer K, Singh H, Llewellyn C, Dutta D, Clarke JM, Howard M, Serban O, Kinross Jet al., 2021, Using a secure, continually updating, web source processing pipeline to support the real-time data synthesis and analysis of scientific literature: development and validation study, Journal of Medical Internet Research, Vol: 23, Pages: 1-14, ISSN: 1438-8871

Background:The scale and quality of the global scientific response to the COVID-19 pandemic have unquestionably saved lives. However, the COVID-19 pandemic has also triggered an unprecedented “infodemic”; the velocity and volume of data production have overwhelmed many key stakeholders such as clinicians and policy makers, as they have been unable to process structured and unstructured data for evidence-based decision making. Solutions that aim to alleviate this data synthesis–related challenge are unable to capture heterogeneous web data in real time for the production of concomitant answers and are not based on the high-quality information in responses to a free-text query.Objective:The main objective of this project is to build a generic, real-time, continuously updating curation platform that can support the data synthesis and analysis of a scientific literature framework. Our secondary objective is to validate this platform and the curation methodology for COVID-19–related medical literature by expanding the COVID-19 Open Research Dataset via the addition of new, unstructured data.Methods:To create an infrastructure that addresses our objectives, the PanSurg Collaborative at Imperial College London has developed a unique data pipeline based on a web crawler extraction methodology. This data pipeline uses a novel curation methodology that adopts a human-in-the-loop approach for the characterization of quality, relevance, and key evidence across a range of scientific literature sources.Results:REDASA (Realtime Data Synthesis and Analysis) is now one of the world’s largest and most up-to-date sources of COVID-19–related evidence; it consists of 104,000 documents. By capturing curators’ critical appraisal methodologies through the discrete labeling and rating of information, REDASA rapidly developed a foundational, pooled, data science data set of over 1400 articles in under 2 weeks. These articles provide COVID-19–re

Journal article

Fernando S, AmadorDíazLópez J, Şerban O, Gómez-Romero J, Molina-Solana M, Guo Yet al., 2020, Towards a large-scale twitter observatory for political events, Future Generation Computer Systems, Vol: 110, Pages: 976-983, ISSN: 0167-739X

Explosion in usage of social media has made its analysis a relevant topic of interest, and particularly so in the political science area. Within Data Science, no other techniques are more widely accepted and appealing than visualisation. However, with datasets growing in size, visualisation tools also require a paradigm shift to remain useful in big data contexts. This work presents our proposal for a Large-Scale Twitter Observatory that enables researchers to efficiently retrieve, analyse and visualise data from this social network to gain actionable insights and knowledge related with political events. In addition to describing the supporting technologies, we put forward a working pipeline and validate the setup with different examples.

Journal article

Fernando S, Scott-Brown J, Şerban O, Birch D, Akroyd D, Molina-Solana M, Heinis T, Guo Yet al., 2020, Open Visualization Environment (OVE): A web framework for scalable rendering of data visualizations, Future Generation Computer Systems, Vol: 112, Pages: 785-799, ISSN: 0167-739X

Scalable resolution display environments, including immersive data observatories, are emerging as equitable and socially engaging platforms for collaborative data exploration and decision making. These environments require specialized middleware to drive them, but, due to various limitations, there is still a gap in frameworks capable of scalable rendering of data visualizations. To overcome these limitations, we introduce a new modular open-source middleware, the Open Visualization Environment (OVE). This framework uses web technologies to provide an ecosystem for visualizing data using web browsers that span hundreds of displays. In this paper, we discuss the key design features and architecture of our framework as well as its limitations. This is followed by an extensive study on performance and scalability, which validates its design and compares it to the popular SAGE2 middleware. We show how our framework solves three key limitations in SAGE2. Thereafter, we present two of our projects that used OVE and show how it can extend SAGE2 to overcome limitations and simplify the user experience for common data visualization use-cases.

Journal article

Hankin CL, Serban O, Thapen N, Maginnis B, Foot Vet al., 2019, Real-time processing of social media with SENTINEL: a syndromic surveillance system incorporating deep learning for health classification, Information Processing and Management, Vol: 56, Pages: 1166-1184, ISSN: 0306-4573

Interest in real-time syndromic surveillance based on social media data has greatly increased in recent years.The ability to detect disease outbreaks earlier than traditional methods would be highly useful for publichealth officials. This paper describes a software system which is built upon recent developments in machinelearning and data processing to achieve this goal. The system is built from reusable modules integrated intodata processing pipelines that are easily deployable and configurable. It applies deep learning to the problemof classifying health-related tweets and is able to do so with high accuracy. It has the capability to detectillness outbreaks from Twitter data and then to build up and display information about these outbreaks,including relevant news articles, to provide situational awareness. It also provides nowcasting functionalityof current disease levels from previous clinical data combined with Twitter data.The preliminary results are promising, with the system being able to detect outbreaks of influenza-likeillness symptoms which could then be confirmed by existing official sources. The Nowcasting module showsthat using social media data can improve prediction for multiple diseases over simply using traditional datasources.

Journal article

Craggs B, Rashid A, Hankin C, Antrobus R, Serban O, Thapen Net al., 2019, A reference architecture for IIoT and industrial control systems testbeds

Conducting cyber security research within live operational technology and industrial Internet of Things environments is, understandably, not practical and as such research needs to be undertaken within non-live mimics or testbeds. However, testbeds and especially those which are built using real-world infrastructure are expensive to develop and maintain. Moreover, such testbeds tend to be representative of a single industry vertical (often based upon the skill set or research focus) and built in isolation. In this paper we present a reference architecture, developed whilst designing and building the Bristol Cyber Security Group ICS/IIoT testbed for critical national infrastructure security research.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00917239&limit=30&person=true