Data without sufficient contextual information is little better than random noise: it is therefore vital to keep such information alongside the data so that it can easily be used and reused without confusion. Useful information to record includes:
- for what purpose the data was captured/generated, by whom and when
- information necessary to interpret the data (e.g. experimental conditions, statistical sampling, calibration information)
- rights and responsibilities, including licensing (if the data is shared) or conditions of access (if access is restricted)
Try to keep this information up to date and hold it in the same storage location as the data itself. For some types of data it may be possible to record this information directly in the data file, see choosing file formats.
This could be as simple as a "README" file kept with the data, but where possible should be recorded in a format that can be processed by computer. This will make the data archival and/or publication process much easier, for example. This type of documentation, especially when formalised, is referred to as "metadata" ("data about data").
If you want to make your data future-proof and interoperable with the community, try to use one or more existing standard metadata formats ("schema") to describe your data.
|Statistical information||Statistical Data Metadata Exchange (SDMX)||SDMX Tool Repository|
|Lab-based research||Investigation-Study-Assay (ISA)||ISA Tools website||Required for submissions to Nature Scientific Data|
|Social and behavioural sciences (also suitable for medicine)||Data Documentation Initiative (DDI)||DDI Alliance tools list|
|Generic||Dublin Core Metadata Initiative (DCMI)||Basic identification of authorship and related information|