Data without sufficient contextual information is little better than random noise: it is therefore vital to keep such information alongside the data so that it can easily be used and reused without confusion. Useful information to record includes:

  • for what purpose the data was captured/generated, by whom and when
  • information necessary to interpret the data (e.g. experimental conditions, statistical sampling, calibration information)
  • rights and responsibilities, including licensing (if the data is shared) or conditions of access (if access is restricted)

Try to keep this information up to date and hold it in the same storage location as the data itself. For some types of data it may be possible to record this information directly in the data file, see choosing file formats.

This could be as simple as a "README" file kept with the data, but where possible should be recorded in a format that can be processed by computer. This will make the data archival and/or publication process much easier, for example. This type of documentation, especially when formalised, is referred to as "metadata" ("data about data").

Standards

If you want to make your data future-proof and interoperable with the community, try to use one or more existing standard metadata formats ("schema") to describe your data.

Common metadata formats
CoverageMetadata standardToolsNotes
 Statistical information  Statistical Data Metadata Exchange (SDMX)  SDMX Tool Repository  
 Lab-based research  Investigation-Study-Assay (ISA)  ISA Tools website  Required for submissions to Nature Scientific Data
 Social and behavioural sciences (also suitable for medicine)  Data Documentation Initiative (DDI)  DDI Alliance tools list  
 Generic  Dublin Core Metadata Initiative (DCMI)    Basic identification of authorship and related information
 
Common metadata formats

Search for discipline-specific metadata standards (Digital Curation Centre)