CORLI projects 2022-2025

Annotation project

Transcription and annotation of corpora are core tasks in the digital humanities field and the question of collaborative annotation of corpora is a central part of the new CORLI project.

Three areas of focus are currently being pursued:

  • Providing a platform for transcription and simple annotation of language data – a deliverable produced as part of the Palamède project, born of a collaboration initiated in 2020-2021 between the Lorraine House of Human Sciences (MSH Lorraine), CORLI, Huma-Num, Atilf and Lit&Art as well as various transcription tool designers, starting with TACT;
  • Providing a high-level annotation platform with active learning functionalities – in collaboration with the INCEpTION team (TU Darmstadt) – collaborative work with TU Darmstadt started in 2020
  • A collaborative layer annotation resource, based on the GUM model – with the participation of the CLLE, Loria, Lidilem and BCL labs.

A first step in the annotation platform design and the creation of a collaboratively annotated resource led to a university project and CORLI workshops in May 2022 (see more)

Citation Project

Once the corpora are in FAIR formats, they are intended to be used and reused for open research. The CITATION project aims to create user tools for creating and using citations of corpora or corpora extracts. The citations themselves will follow existing standards or standards proposed by RDA (Research Data Alliance). The tools created will allow to:

  • Select from deposited corpora or from the Open French Corpus places constituting a corpus extract
  • Ccreate lasting web pages allowing to visualize or present a corpus or a corpus extract (either previously selected or manually inserted)
  • Generate bibliographic references pointing to lasting web pages and inserted in the body and in the references section of a scientific text (in RIS, BibTex, etc., and therefore able to be used in reference management software like Zotero).

Corpucit is in line with the FAIR principles of open science and data paper use. It will improve access and visibility of creating and depositing corpora.

A more detailed presentation of the project and its objectives can be downloaded here.

Open French Corpus project

This project aims to centralize existing French corpora from various projects, all validated and normalized by the community, and make them available in a common space with appropriate tools to use them. This project has three phases that can be carried out simultaneously:

  • Identify and gather existing corpora as well as methods, techniques and formats used to build them
  • Determine a minimum core of formats, quality and preparation of the corpora to be made available, presentation of a processing chain allowing standardization of new corpora or upgrading of old corpora
  • Make corpora available for download, full-text search, tool-based search


Workshop organized by the CORLI and CAHIER consortiums on May 24, 2022 in Nice.

All CORLI projects follow the same policy: use existing tools or data and avoid at most developing technologies from scratch. Instead, CORLI focuses on making existing technologies better known and on creating bridges (if necessary by agreeing to make necessary developments) or documentation to make them accessible or better exploited.