Welcome to the CORLI consortium website!

CORLI is a network of laboratories and researchers working on language corpora. Its goal is to provide everyone with data, tools, documentation and training on the scientific use of language corpora following the FAIR principles. CORLI is open to teachers-researchers, researchers, engineers, from all over the world and to the study of all languages and demonstrates it particularly by being labelled K Center by the European infrastructure CLARIN. 

The CORLI consortium is particularly committed to gathering existing resources from the consortium’s laboratories, to helping the scientific community to perpetuate its data, to better disseminate them, and to providing complementary means to network projects validated by researchers in corpus linguistics.

More specifically, CORLI proposes:

  • Documentation on tools, formats, good practices, legal aspects
  • Training in the use of tools and formats
  • Assistance to users in our K center.

CORLI is actively working on three projects that meet community needs:

  • Collaborative annotation
  • Corpus citation or corpus extracts
  • Open French Corpus project: corpus data and tools for the French language

The CORLI consortium is coordinated by Christophe Parisse and Céline Poudat and managed by the MESHS of Lille (House of Human Sciences). It brings together academics, researchers and engineers in linguistics, and federates teams and laboratories involved in the production and processing of written, oral or multimodal digital corpora.

To be informed of the consortium’s activities, we invite you to subscribe to the mailing list.

CORLI is open to everyone!

You can join one of the network groups, or contact us directly via this form.

All laboratories are invited to participate: you can contact the CORLI Steering Committee or subscribe to our mailing list


Métadonnées (partie 1)
Métadonnées (partie 1)

Métadonnées Coordination : Carole Etienne PARTIE 1 – FACILITER LA RÉUTILISATION DES CORPUS PAR D’AUTRES CHERCHEURS Pourquoi un chercheur serait amené à réutiliser un corpus ? Disposer d’un volume de données plus important Explorer les mêmes données dans différentes perspectives : analyses syntaxiques, prosodiques, phonologiques ou interactionnelles d’une même donnée Bénéficier de différents jeux d’annotations qui ... Lire plus

Métadonnées (partie 2)
Métadonnées (partie 2)

Les métadonnées actuellement disponibles pour les corpus oraux Analyse de l’existant, notamment avec le projet ORFEO (3.5 M de mots, 14 sources de données). Très hétérogènes tant au niveau du format … Fichier texte (pdf, word) Fichiers tabulaires (excel, csv) XML (Dublin Core/OLAC, TEI Header, CMDI) … que des contenus champs basiques : durée, âge, ... Lire plus

previous arrow
next arrow