What is research data management? 

Data management is increasingly recognised as an essential part of good research practice. Responsibly managed data is important for research integrity, transparency and open science. Many funders now expect data that supports published findings or has potential for reuse in future research to be made publicly available with as few restrictions as possible whenever legal and ethical restrictions allow.  

Research data management refers to how you will look after the data you collect or generate during your research. It covers activities such as planning for your data management needs at the start of your project, organising, storing and securing data during your project and ensuring long-term preservation, data sharing and reuse at the end of your project. 

Why manage research data? 

The benefits of research data management include: 

  • Reduces the risk of data loss
  • Makes it easier to find and understand data
  • Helps make data authentic, accurate and reliable
  • Improves research integrity and reproducibility
  • Enables compliance with funder and publisher policies
  • Facilitates data sharing and reuse

What are research data?  

Research data are any materials that you collect or generate during your research project that can be used to support or verify your research findings. The UKRI Concordat on Open Research Data defines research data as ‘… the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical)'. 

Research data can be generated or collected for different purposes and through different processes: 

  • Observational: data captured in real-time, usually irreplaceable e.g. sensor data, survey data, sample data, neuroimages
  • Experimental: data from laboratory equipment, often reproducible, but can be expensive to reproduce e.g. gene sequences, chromatograms, toroid magnetic field data
  • Simulation: data generated from test models where the model and metadata are more important than output data e.g. climate models, economic models
  • Derived or compiled: data is reproducible, but expensive e.g. text and data mining, compiled database, 3D models
  • Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets most probably published and curated e.g. gene sequence databanks, chemical structures, spatial data portals 

Examples of research data include:

  • text documents
  • spreadsheets
  • audio and video recordings
  • photographs, films
  • collections of digital objects
  • questionnaires, transcripts of interviews
  • sensor readings
  • models, algorithms, scripts