What are metadata and what are they for?

Metadata are a set of information that one decides to keep in addition to the linguistic data itself, in order to document them and to facilitate the use of the corpus by other researchers. Such information can be very different: data sources, software (and its exact version) used for data collection or processing, information about the speakers (age, gender, mother tongue…) or about the acquisition situation for oral or multimodal data, etc.

A very important point: metadata should be standardized, i.e. expressed according to an international standard recognized by the scientific community. As practices are still very heterogeneous today, CORLI is leading an action of corpus enhancement aiming at finalizing the formatting of existing corpora, following the FAIR (Findable, Accessible, Interoperable, Reusable) principles.

More resources on the CORLI website: