New study uses question and answering to help ensure the accurate update of knowledge graphs – used in recommendation systems and chatbots.
In a recent study, published in Data, PhD student Ryan Ong, Research Fellow Dr Ovidiu Serban, Honorary Research Fellow Dr Jiahao Sun and Co-Director Professor Yi-Ke Guo from the Data Science Institute created a new dataset called TKGQA (Temporal Knowledge Graph Question Answering) to help researchers develop better ways to update and validate knowledge graphs using question and answering.
Knowledge graphs are like databases that represent information about the world and are increasingly being used in applications such as recommendation systems, chatbots and semantic searches.
It is important to ensure these knowledge graphs are being updated accurately with the latest information so that these applications can perform better and faster.
Through this research, in collaboration with the Royal Bank of Canada, Imperial data scientists have created a way to more accurately update the knowledge graphs using a dataset that more closely represents the real world.
What are knowledge graphs?
A knowledge graph represents the world through structural facts, consisting of entities and relationships – entities are real-world objects such as people, companies, countries etc and relationships capture the relation between these real-world objects.
Most research into knowledge graphs has focused on static knowledge graphs, where facts remain unchanged over time. However, that is unrealistic to the real world and therefore in order to capture changes over time, the knowledge graph must evolve into one that is temporal.
While similar datasets exist, the TKGQA is the first to represent the real world more closely as it uses entity and relationship extraction combined rather than simply one or the other.
Question and answering
Currently, it is challenging to accurately validate the evolution of a temporal knowledge graph and there is no reliable method to verify the updated knowledge graphs once information has been altered.
The TKGQA dataset was created to address these challenges by providing a more realistic basis for developing methods to update and verify knowledge graphs through the use of question and answering.
The researchers collated over 5000 financial news documents as part of the TKGQA dataset, and for each of these documents extracted facts, question-answer pairs and before and after temporal knowledge graphs. This involved having a set of questions which were either human-generated or model-generated and paired answers for each document in order to see if the generated answers matched those known to be true.
This question-answering process can be used to fact-check the update process of temporal knowledge graphs to ensure the most accurate and up-to-date information is being displayed.
According to lead author Ryan Ong: “This is a challenging task because it requires not only identifying relevant information in the text but also understanding the temporal relationships between events and entities mentioned.”
Unseen entities and relations
Moving forward, the researchers hope to use the TKGQA dataset in future research into low resource temporal link prediction.
While temporal link prediction refers to predicting future connections between two entities based on their past interactions, low resource temporal link prediction refers to the same task, but when there is limited data available to train the predictive model.
The new dataset will help develop techniques and models that can effectively use the limited data available to make accurate predictions about future relationships and entities that are otherwise ‘unseen’.
This research can help enhance decision-making processes across a wide range of applications: helping to improve forecasting, understanding the spread of information and identifying patterns to help optimise systems for the benefit of society.
‘TKGQA Dataset: Using Question Answering to Guide and Validate the Evolution of Temporal Knowledge Graph’ by Ong et al., published on 14 March 2023 in Data.
Article text (excluding photos or graphics) © Imperial College London.
Photos and graphics subject to third party copyright used with permission or © Imperial College London.
Faculty of Engineering
Show all stories by this author