Open French Corpus Project

This project aims to centralize existing French corpora from various projects, all approved and normalized by the community, and make them available in a shared space with suitable tools to use them. This project has three phases that can be carried out simultaneously:

  • Identify, gather existing corpora as well as methods, techniques and formats used to build them
  • Establish a minimum core of format, quality and preparation of corpora to be made available ; present a processing chain to standardize new corpora or update older ones
  • Make corpora available for download, full text search, and tool search

Metadata

  • Listing of existing corpora eligible for OFC, standardization of metadata
  • Creation of a pilot corpus and metadata comparison – in cooperation with the ARIANE (CAHIER) consortium

Three areas of work

  1. State of the art and work on existing corpora
  2. Evaluation of Opentheso and work on metadata
  3. Working group created situation of enunciation parameters

Opentheso evaluation

Situation of enunciation parameters

Events & resources