Data Capture and Documentation
1 Data capture
Data capture should always be done in a consistent way. It is strongly recommended to develop Standard Operating Procedures (SOP) that clearly define the steps to be taken and outlines roles and responsibilities. SOPs are useful even for single person projects to ensure that there is consistency over time. Beside informing on experimental setup and applied procedure, SOPs should also indicate when to create documentation and where and how files are named.
2 Dataset Documentation (Metadata)
Describing and documenting data is the only ways to ensure that they will be discoverable, searchable, and re-useable in the future. This documenting is often called metadata and includes all relevant information.
Commonly there are two types of documentation (UK Data Archive – Document your data) that ensure usability far in to the future. They are:
- Descriptive or study-level documentation
- Structural or data-level metadata
2.1 Metadata – Descriptive or study-level documentation
These metadata describe the content and purpose of the study in a comprehensive way. They include the goal of the study, the design of the study with all of its experiments and measurements, its major findings and the conclusion. Based on these metadata information it should be feasible to comprehend the study undertaken and its outcome.
2.2 Metadata – Structural or data-level metadata
These metadata detail the actual measurement with its samples and references, the employed SOP and the comprehensive data analysis. It is their purpose to illustrate each of the measurement undertaken. Furthermore, they should be so detailed that any other institution is capable to fully reproduce the measurement results. The reproducibility of measurement results and whole studies is one of the essential items of RDM.
2.3 What should Metadata contain
Documentation needs will vary by project and by discipline. Many disciplines have developed metadata standards that specify what information should be collected. If there isn't an existing standard, a template should be created that will record all the important details of the data. At the minimum it should include the following keywords:
- Creator: names and addresses of data creator(s)
- Publisher: addresses of data publishers
- Identifier: can be a permanent identifier or an internal project number
- Rights: Intellectual property or licensing rights for the data
- Access Information
- Project description (e.g. subject, scope etc.)
- Data Citation: Preferred format for citing data.
Data and file overview:
- Data Structure: including relationships between files
- File description: A short description of each file
- Dates that the file was created
- Measurement process: Description of measurement setup and measurement method
- Data capturing: Description of methods for data capturing
- Data processing: Description of methods for data processing (if data is not raw data)
- Variable list: with full names and definitions of column headings if tabular data
- Measured quantity (measurand)
- Units of measurement: IS units of the measurement result
- Location of measurement
- Definitions: Definitions for codes or symbols used to record missing information
This list is based on MIT Documentation and Metadata guidance, UK Data Archive Study Level Documentation, and Cornell University, Guide to writing "readme" style metadata. This meta-data information should be included as documentation in a README.txt file in the folder with the data files.
3 README Files
These metadata are often collected in readme files, which are plain text files (.txt) or sheets in a spreadsheet. It helps others to understand the research data and interconnections among data files. By titling the file "readme," the date creator informs to users that this file should be looked at first. For researchers depositing data in a data repository, the information in the readme file augments information included in the metadata form. Furthermore, if the deposit includes multiple files, it explains the file naming structure, relationship among the files, and abbreviations used. Cornell University's Research Data Management Service Group has made a useful readme file template available for download.
4 Metadata Standards
There are a number of community-maintained lists of disciplinary metadata standards:
- General collection or definition of metadata
- Biological science
- Minimum Information for Biological and Biomedical Investigations (MIBBI)
- MINimal information about high throughput SEQeuencing Experiments (MINSEQE) - Genomics standard
- Ecological science
- Ecological Metadata Language (EML) - specific for ecology disciplines
- Geographic information
- ISO 19115-1:2014 Geographic information - Metadata - Part 1: Fundamentals
- Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC-CSDGM)
- Social, behavioral, economic, and health sciences
- Data Documentation Initiative (DDI) - common standard for social, behavioral and economic sciences, including survey data