Evaluation of Corpora
Team leaders: Christophe Parisse, Céline Poudat
Assembling resources is, today, one of the major projects undertaken by many researchers and laboratories – building them has a direct effect on the structure of research and the division of scientific work. The development of such practices and the growing reliance on data in scientific argumentation – whether in the construction of observable facts or the testing of hypotheses – entails integrating their output into the results of research units. If the collection of usable resources is necessary for certain modes of operating and for several disciplinary fields in linguistics, the building of corpora represents, above all, a mode of research that is in full-swing. The question of criteria for the evaluation of resources calls for thoughtful data storage, but also allows research to progress with transparency and reusability of data in mind.
The piloting committee of the CORLI group organized, on the 23 of September 2016, a seminar on the evaluation of corpora. This seminar was a chance to exchange and reflect on the criteria for the evaluation of resources.
- Franck Neveu – Evaluation des corpus et institutions d’évaluation de la recherche
- Christophe Parisse – Pourquoi déposer ses corpus ?
- Jean-Marie Pierrel – Pourquoi évaluer la qualité d’un corpus ?
- Thierry Chanier – Critères de qualité pour les données de la recherche en général. Transposition aux corpus en linguistique
- Michel Jacobson – Comment décrire un corpus à des fins d’archivage ?
- Olivier Baude – Évaluer la qualité des corpus ? Pourquoi ? Comment ?
- Carole Etienne – Interopérabilité et métadonnés quels besoins dans un projet de recherche
October 3rd, 2019, CORLI organized a seminar on the evaluation of corpora. This theme is particularly important today, in part because it is part of an improved evaluation of work done in linguistics, and in part because it brings to linguistics the sharing of data as is done in numerous other disciplines (the FAIR movement).
The seminar was composed of presentations from Helene Andreassen (Norway), Gabriel Bergounioux, Bernard Laks, François Rastier, Mathieu Valette, who represent a large number of the organizations involved in corpus linguistics research.
Contents of the presentations of the 2019 seminar.