How simple question and answering can improve the accuracy of knowledge graphs

by

Knowledge graphs

Visual representation of a knowledge graph, created by PhD student Ryan Ong using artificial intelligence

New study uses question and answering to help ensure the accurate update of knowledge graphs – used in recommendation systems and chatbots.

In a recent study, published in Data, PhD student Ryan Ong, Research Fellow Dr Ovidiu Serban, Honorary Research Fellow Dr Jiahao Sun and Co-Director Professor Yi-Ke Guo from the Data Science Institute created a new dataset called TKGQA (Temporal Knowledge Graph Question Answering) to help researchers develop better ways to update and validate knowledge graphs using question and answering.

Knowledge graphs are like databases that represent information about the world and are increasingly being used in applications such as recommendation systems, chatbots and semantic searches.

It is important to ensure these knowledge graphs are being updated accurately with the latest information so that these applications can perform better and faster.

Through this research, in collaboration with the Royal Bank of Canada, Imperial data scientists have created a way to more accurately update the knowledge graphs using a dataset that more closely represents the real world.

What are knowledge graphs?

A knowledge graph represents the world through structural facts, consisting of entities and relationships – entities are real-world objects such as people, companies, countries etc and relationships capture the relation between these real-world objects.

Most research into knowledge graphs has focused on static knowledge graphs, where facts remain unchanged over time. However, that is unrealistic to the real world and therefore in order to capture changes over time, the knowledge graph must evolve into one that is temporal.

While similar datasets exist, the TKGQA is the first to represent the real world more closely as it uses entity and relationship extraction combined rather than simply one or the other.

Question and answering

Currently, it is challenging to accurately validate the evolution of a temporal knowledge graph and there is no reliable method to verify the updated knowledge graphs once information has been altered.

The TKGQA dataset was created to address these challenges by providing a more realistic basis for developing methods to update and verify knowledge graphs through the use of question and answering.

The researchers collated over 5000 financial news documents as part of the TKGQA dataset, and for each of these documents extracted facts, question-answer pairs and before and after temporal knowledge graphs.  This involved having a set of questions which were either human-generated or model-generated and paired answers for each document in order to see if the generated answers matched those known to be true.

This question-answering process can be used to fact-check the update process of temporal knowledge graphs to ensure the most accurate and up-to-date information is being displayed.

According to lead author Ryan Ong: “This is a challenging task because it requires not only identifying relevant information in the text but also understanding the temporal relationships between events and entities mentioned.”

Unseen entities and relations

Moving forward, the researchers hope to use the TKGQA dataset in future research into low resource temporal link prediction.

While temporal link prediction refers to predicting future connections between two entities based on their past interactions, low resource temporal link prediction refers to the same task, but when there is limited data available to train the predictive model.

The new dataset will help develop techniques and models that can effectively use the limited data available to make accurate predictions about future relationships and entities that are otherwise ‘unseen’.

This research can help enhance decision-making processes across a wide range of applications: helping to improve forecasting, understanding the spread of information and identifying patterns to help optimise systems for the benefit of society. 


-


TKGQA Dataset: Using Question Answering to Guide and Validate the Evolution of Temporal Knowledge Graph’ by Ong et al., published on 14 March 2023 in Data.

 

Reporter

Gemma Ralton

Gemma Ralton
Faculty of Engineering

Click to expand or contract

Contact details

Email: gemma.ralton@imperial.ac.uk

Show all stories by this author

Tags:

Engineering-AI-and-machine-learning, Big-data, Global-challenges-Data
See more tags