Introduction

When planning a research project, it is important to consider which file format will be used to store the data. The file formats you use have a direct impact on sharing research data and reusing it in the future.

The formats presented in the table as recommended are file formats that will offer the best long-term guarantees in terms of usability, accessibility and sustainability. Supported formats are file formats that are widely used and will be moderately usable and accessible in the long term.

The file formats most likely to be accessible have the following characteristics:

  • Not owners
  • Open and documented standards
  • Used by the research community
  • Standard representation (ASCII, UNIcode)
  • Unencrypted
  • Uncompressed
Good practices
  • Whenever possible, you should save data in an open, sustainable, non-proprietary format (proprietary software often allows you to “save as” an open format without difficulty). If converting to an open data format results in data loss from the files, consider saving them in both proprietary and open formats. In this way, part of the information will be available.
  • When it is necessary to save files in a proprietary format, consider including a readme.txt file as a guide that documents the name and version of the software used to generate the file, as well as the company that made the software. It can be very helpful if you need to open these files.
  • Several measures can be taken to avoid the risk of obsolescence and to ensure the accessibility and sustainability of files. One such measure is to use file formats that have a high probability of remaining usable for many years.
Kind of data
  • Observational: data captured in real time (eg, neuroimaging, sample data, sensor data, survey data)
  • Experimental: data captured on laboratory equipment (eg, gene sequences, chromatograms, magnetic field data)
  • Simulation: data generated from test models (e.g. climatological, mathematical or economic models)
  • Derived or compiled: data that is reproducible but difficult to reproduce (eg, from text and data mining, 3D models, compiled databases)
Recommended and accepted formats

Data formats

Recommended formats

Not recommended but commonly accepted formats

Text documents

Plain text

Markup languages

Programming languages

Spreadsheets

Database

Statistical data

Images (bitmap)


Vector images

Audio (Container and Codec)


Audio (container)


Audio (Codec)

Video  (Container)

Video (Codec)


Computer aided design (CAD)

Geographic information systems (GIS)

Georeferenced images

GIS raster

3D

RDF


Computer Assisted Qualitative Data Analysis (CAQDAS)



Table prepared from the format table of Data Archiving and Networked Services (DANS)

More information on formats in Recommended Formats Statement de la Library of Congress

  • No labels