CLARIN: Common Language Resources and Technology Infrastructure. A European group in the service of linguistics and social and human sciences whose goal is to build an integrated and interoperable research structure for linguistic resources and technologies. It is an important player in the normalization of descriptors. France has observer status.


CMDI: Component MetaData Infrastructure. A metadata format developed within CLARIN, allowing for, among other things, documenting a hierarchical structure between metadata files. The current version of this format is 1.2


A method of text extraction based on the presentation of text samples which all contain the sam word or the same sequence or patterns. Some tools: ConcQuest, Unitex, Frantext, AntConc, Hyperbase, TXM.  

Consultation (right of consultation; means of consultation)

Anyone putting a corpus together must define the means of access or consultation. Corpora can be associated with different means of consultation, ranging from an access restricted to the researcher(s) and those involved in building them, to free access open to the public, notably via the internet. The importance of a precise definitions of means … Read more


Context plays a fundamental role in the use of speech (in sign language} and determines certain universal properties of human language. It concerns multiple dimensions. (1) co-verbal/non-verbal aspects of the immediate situation, such as the spatio-temporal parameters that define the situation, and, for oral language, the look and any other bodily, facial, or gestural information … Read more

Convention on annotation

The whole of rules on information codification (linguistic, contextual, gestural…) agreed on for the annotation of a resource, such that a given event is represented in a consistent and unambiguous way. It allows for the interoperability of annotations done by different operators, at different times. There are conventions developed within projects (ex. PFC, Rhapsodie, LANGACROSS, … Read more


A method of exploration that consists of automatically identifying word associations that appear simultaneously and in a statistically significant way in a single context. Some tools: TXM


A coherent set of data, without necessarily meaning a large volume. A corpus should be composed, at a minimum, aside from the folders that make it up, a metadata file (ex. OLAC) which is visible to search engines.

Coverbal/non verbal

Dans la communication en Langue des signes (sign language), de nombreuses informations non verbales accompagnent l’utilisation du langage Oral vs. écrit. Constituant une partie indissociable de la communication, ces informations sont dites coverbales. Ainsi, la parole se situe dans un contexte comprenant des entités diverses qui définissent l’univers du discours dans la situation immédiate (interlocuteurs, … Read more