The parallel positioning of two states of a single text (e.g. a text and its translation, a text produced in T1 and its latest version, produced in T2, etc.).


In this wiki, we use the term “annotation”, in the broad sense, to designate documentation of sub-parts of recordings (sentences, words, speech order…) as opposed to the term “metadata”, which designates the documentation of a recording as a whole. In the strict sense, the term “annotation”, which is the encoding of different kinds of information … Read more

Annotation of under-described languages

Annotation of under-described languages minimally consists of a morphosyntactic gloss and a free translation. Each morpheme (lexical or grammatical) is linked to a tag which corresponds to a grammatical category (ex. future, plural, antipassive, etc.) and/or a translation (for lexemes) For example: “il a fini”: il SBJ.3SG.M a have.PRS.3SG fini finish.PTCP.PST Translation: “he has finished”. … Read more

Annotation software

Digital data is often associated with symbolic (or sometimes analogical) data allowing it to be described in detail and studied (for example the content of speech, the form of gestures, the gaze direction, etc.). In most cases, this symbolic data is associated with a certain time point or a certain time interval (we will use … Read more

Annotation track (tier, track)

Some software allows annotation of events on one or more tracks (sometimes called third), defined by the annotator, and in which the annotated phenomena are aligned temporally on the signal audio and / or video. Tracks are used to annotate phenomena of a different nature on several lines (see also Annotation scheme) Content validated by … Read more


Anonymization consists of removing all information which could be used to identify an individual, in the interest of sharing data without impacting privacy. This operation concerns the identifying information for a participant or place which would allow for the the participants to be identified, the audio or video signal, the transcription with personal information such … Read more


In the sense of the code du patrimoine, “archives are the whole of documents, whatever their date, their state of conservation, their form and their support, produced or received by any person morally or physically in any service or organization, public or private, while exercising their activity” [translation of Article L211-1]. Digital documents are seen … Read more


In the broad sense, this is about managing the life cycle of information that covers a period from its creation to its suppression or permanent preservation. This management refers to actions such as collection, classification, conservation, and communication. In a strict sense archiving is the action of: “transferring documents which are no longer of current … Read more


CLARIN: Common Language Resources and Technology Infrastructure. A European group in the service of linguistics and social and human sciences whose goal is to build an integrated and interoperable research structure for linguistic resources and technologies. It is an important player in the normalization of descriptors. France has observer status.


CMDI: Component MetaData Infrastructure. A metadata format developed within CLARIN, allowing for, among other things, documenting a hierarchical structure between metadata files. The current version of this format is 1.2