Glossary of Research Data Management terminology

Anonymised data: Data relating to a specific individual where the identifiers have been removed to prevent identification of that individual.

Archive: A place or collection containing static records, documents, or other materials for long-term preservation.

Big data: A broad term for datasets that are so large and complex that traditional processing applications are inadequate. The use of advanced methods can extract value from data, such as trend-analysis, disease prevention, crime prevention, etc. Challenges can include analysis, capture, curation, sharing, storage, information privacy, etc.

Born digital: This refers to materials that originate in digital form, in contrast to analogue, or physical,  materials that have been reformatted to become digital. Examples of born digital content include: websites, forums, wikis, digital documents and manuscripts, electronic records, data sets, digital photographs, digital art, digital media publications.

Data: The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

Things known or assumed as facts, making the basis of reasoning or calculation.

Data catalogue: A curated collection of metadata about datasets and their data elements.

Data lifecycle: Refers to all the stages in the existence of digital information from creation to destruction. A lifecycle view is used to enable active management of the data objects and resource over time, thus maintaining accessibility and usability.

Data management plan: A plan outlining what data will be created, how, and whether the necessary support is in place, making the research process easier by saving time and effort. Data management plans typically require researchers to cover:

  • description of the data to be collected / created
  • standards / methodologies for data collection and management
  • ethics and intellectual property concerns or restrictions
  • plans for data sharing and access
  • strategy for long-term preservation

Most funders have specific guidelines which should be referred to.

The DCC provides a summary of DMP requirements and links to funders’ guidance.

Data sharing: The transfer of data between different organisations or individuals to improve the efficiency and effectiveness of research or service delivery, in line with current domestic legislation and the UK’s international obligations.

Dataset: The Freedom of Information act defines a dataset as: information comprising a collection of information held in electronic form where all or most of the information in the collection:

Has been obtained or recorded for the purpose of providing a public authority with information in connection with the provision of a service by the authority or the carrying out of any other function of the authority and is factual information which:

  • is not the product of analysis or interpretation other than calculation, and
  • is not an official statistic (within the meaning given by section 6(1) of the Statistics and Registration Service Act 2007), and
  • remains presented in a way that (except for the purpose of forming part of the collection) has not been organised, adapted or otherwise materially altered since it was obtained or recorded.

De-anonymisation: The process of determining the identity of an individual to whom a pseudonymised dataset relates by cross-referencing with other sources of data.

Digital curation: The selection, preservation, maintenance, and archiving of digital objects in order to add value to collections of data for present and future use. Robust digital curation will mitigate digital obsolescence and ensure user access to information indefinitely.

Digital Object Identifier (DOI): A persistent identifier for a digital object on a network. It is permanently assigned to an object that allows it to be referenced reliably even if its location or metadata undergo change over time.

Disclosive: Data is potentially disclosive if, despite the removal of obvious identifiers, characteristics of this dataset in isolation or in conjunction with other datasets in the public domain might lead to identification of the individual to whom a record belongs.

Intellectual property: A set of property rights that grant the right to protect the materials created by them. Intellectual property comprises copyright, designs, patents, certain confidential information and trademarks.

Linked data: Described by an identifier and an address to permit linking with other relevant data which might not otherwise be connected, improving discoverability. It may contain embedded links to other data.

Metadata: ‘Data about data’. Content, such as that in a digital repository, can be described in detail via the input of associated metadata. This acts like a catalogue record and allows searching across items within the repository. If the repository has implemented an appropriate metadata exposure method (such as Open Archives Initiative Protocol for Metadata Harvesting and/or RSS) the metadata can then be harvested by external services and exposed to the wider world.

Repositories use open standards to ensure that the content they contain can be searched and retrieved for later use. The use of these agreed international standards allows mechanisms to be set up which import, export, identify, store and retrieve the digital content within the repository.

Metadata enables research data to be discovered, understood and effectively re-used by others. Published research results should always include information on how to access the supporting metadata.

Mosaic effect: The process of combining anonymised data with auxiliary data in order to reconstruct identifiers linking data to the individual it relates to.

Open access: Provision of free access to peer-reviewed academic publications without subscription charges or paywalls.

Open data: Data that is:

  • accessible, ideally via the internet at no more than the cost of reproduction, without limitations based on user identity or intent
  • in a digital, machine-readable format for interoperation with other data
  • free of restriction on use or redistribution in its licensing conditions

Open data also refers to the availability of data underlying research outputs. A principle of Research Councils UK is that publicly funded research data is a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.

Open standards: Repositories use open standards to ensure that the content they contain is accessible in that it can be searched and retrieved for later use. The use of agreed international standards allows mechanisms to be set up which import, export, identify, store and retrieve the digital content within the repository.

Personal data: As defined by the Data Protection Action 1998, data relating to a specific individual where the individual is identified or identifiable in the hands of a recipient of the data.

Pseudonymised data: Data relating to a specific individual where the identifiers have been replaced by artificial identifiers to prevent identification of the individual.

Raw data: Data that has not been processed for meaningful use. Although raw data has the potential to become "information," it requires selective extraction, organisation, and sometimes analysis and formatting for presentation. As a result of processing, raw data sometimes ends up in a database, which enables the data to become accessible for further processing and analysis in a number of different ways.

Repository: A digital repository is a means of managing, storing and providing access to digital content. It is where digital assets are stored and managed to facilitate searching and retrieval for later use. A repository supports mechanisms to import, export, identify, store, preserve and retrieve digital assets.

Putting digital content into an institutional repository enables institutions to manage and preserve it, and therefore derive maximum value from it. A repository can support research, teaching, learning, and administrative processes. Although many institutional repositories are primarily established for the benefit of the organisation and its users, there is an increasing movement towards open access to the wider community, sometimes in a global sense.

Research: The College defines a research project as one that constitutes a ‘useful subject of study’; in a subject, or be directed towards establishing an outcome, which is both of value and calculated to promote the College’s charitable aims in a meaningful and direct way. 'Useful' means that the project should be capable of increasing or enhancing knowledge, understanding and learning, rather than being ‘useful’ in a practical sense. It must involve the creation and/or acquisition of new knowledge - and does not include the application of existing knowledge or expertise to solve specific problems where there is nothing novel being created. A project must be undertaken ‘for the public benefit’ and with the intention that the useful knowledge it generates will be disseminated to the public and others who are able to utilise or benefit from it. Any useful knowledge generated must be disseminated.

Research Excellence Framework (REF): The REF is the system for assessing research in UK universities. The results are used to determine public funding for universities' research, and affect their reputations. Through the REF expert panels, made up of both practising researchers and research users, assess the academic excellence of research as well as the impact of research beyond academia. Universities can submit all types of research, funded from any source.

Research impact: A recorded, or otherwise auditable occasion of influence, from academic research on another actor or organisation. Impact is usually demonstrated by pointing to a record of the active consultation, consideration, citation, discussion, referencing or use of a piece of research. Research has an academic impact when the influence is upon another researcher, university organization or academic author, usually demonstrated by citation indicators.

Research output: The finished product(s) of the research, rather than the components that make it up. The following are considered research outputs in the Research Excellence Framework:

  • Books (authored, edited, chapters, scholarly editions)
  • Journal articles (published articles, conference contributions, working papers)
  • Physical artefacts, devices and products
  • Exhibitions and performances
  • Patents and published patent applications
  • Composition
  • Designs
  • Research reports and confidential reports for external bodies
  • Digital (software, website content, digital or visual media, research datasets and databases)
  • E-theses, teaching materials and administrative data can also be considered research outputs

CASRAI dictionary (now managed by CODATA)

If you are unable to find a term here, a more detailed Glossary has been developed as a practical reference for individuals and working groups. A regularly updated ‘living document’ this Glossary provides a URL allowing researchers to link directly to RDM terms should they wish to.

CASRAI Dictionary