A data dictionary is a type of metadata that links in an organized way the names, definitions and characteristics of each of the fields or attributes of a dataset. Its aim is to provide a common language between the author of the data and potential users. In addition, they allow us to understand and interpret a dataset by providing basic information about the fields or variables it contains. They provide the following information:
- What each field or variable means.
- What kind of data does it contain?
- What values can it take, or if it uses any catalog.
- If it contains public, confidential or reserved information.
The data dictionaries are designed to facilitate understanding and provide meaning, therefore they must document the existence, meaning and use of each element of the dataset.
Those responsible for the data must keep the contents of the data dictionary up to date, including definitions and values.
A codebook provides information about the structure, content, and layout of a data file. A well-documented codebook contains information that is intended to be complete and self-explanatory for each variable in a data file.
Although codebooks vary widely in the quality and quantity of information provided, a typical codebook includes:
- Column locations and widths for each variable.
- Definitions of different types of registration.
- Response codes for each variable.
- Codes used to indicate non-response and missing data.
- Exact questions and skip patterns used in a survey.
- Other indications of the content and characteristics of each variable.
The body of a codebook describes the contents of the data file.
