This project aims to centralize existing French corpora from various projects, all approved and normalized by the community, and make them available in a shared space with suitable tools to use them. This project has three phases that can be carried out simultaneously:
- Identify, gather existing corpora as well as methods, techniques and formats used to build them
- Establish a minimum core of format, quality and preparation of corpora to be made available ; present a processing chain to standardize new corpora or update older ones
- Make corpora available for download, full text search, and tool search
Metadata
- Listing of existing corpora eligible for OFC, standardization of metadata
- Creation of a pilot corpus and metadata comparison – in cooperation with the ARIANE (CAHIER) consortium
Three areas of work
- State of the art and work on existing corpora
- Evaluation of Opentheso and work on metadata
- Working group created situation of enunciation parameters
Opentheso evaluation
Events & resources
- Work meeting, Nov. 25, 2022
- Workshop, Dec. 9, 2022
- Work meeting, Feb. 3, 2023