Data documentation and metadata
What is data documentation?
Data documentation provides the contextual information needed to discover, understand, access and reuse research data. Without this information it may be impossible for future users, including yourself, to understand your data.
Examples of data documentation include:
- laboratory notebooks & experimental protocols
- questionnaires, codebooks, data dictionaries
- software syntax and output files
- information about equipment settings & instrument calibration
- database schema
- methodology reports
- provenance information about sources of derived or digitised data
Why document data?
Data documentation is essential for the reproducibility and replication of research findings and the re-analysis of data. Ensuring that data are adequately documented and described supports research transparency and facilitates data sharing and reuse. Documenting your data also minimises the risk of your data being misused or misinterpreted.
When should I document my data?
You should begin documenting your data at the beginning of your project and continue adding information as you go. It is much easier to capture information as the project progresses than trying to remember what you have done at a later date.
What information should I include?
Data can be described at different levels:
Project-level documentation provides information about the aims of the study, what the research questions were, methods of data collection, instruments used, how the data were processed, who collected the data and when, and how the data can be accessed.
File-level documentation provides descriptions of the contents of a folder or dataset including details of data types, file formats used, and relations between files contained in the folder or dataset. A README.txt file is a form of documentation commonly used for this purpose.
Variable-level documentation provides definitions and explanations of variables, values, units of measurement, missing values and any other codes or abbreviations used. This information can be embedded within a data file or documented separately as a data dictionary or codebook or included within a README file.
A README file template which you can download and adapt for your data is available from Cornell University.
What is metadata?
Metadata is commonly defined as ‘information about data’. Metadata and documentation are sometimes used interchangeably but metadata is also used to describe information that is structured and machine readable. Some research communities make use of domain specific metadata standards. Links to disciplinary specific metadata standards are available on these websites:
- Research Data Alliance Metadata Repository
- Digital Curation Centre Disciplinary Metadata