What are the legal and ethical issues involved in collecting data and making it available in a corpus?

Sharing resources is essential in an open science approach as promoted by CORLI. When data collected to build a corpus comes from speakers, thus from individuals, personal information and intellectual property should be protected. In some cases, relevant data for linguistic analysis are directly identifying (information on the speaker, voice, image…) or even sensitive (opinions, … Read more

Should I anonymize my corpus?

If the corpus includes personal data (i.e. directly or indirectly recognizable data), the publication of the corpus (extracts or in its entirety) requires prior anonymization (of textual, oral, or audiovisual data). Otherwise, a usage limitation will be necessary (to be defined with the competent data protection representative).

What guidelines and texts regulate the creation and use of corpora?

Guidelines: Before creating a corpus, it is recommended to establish a data management plan and to follow the FAIR principles (to produce Findable, Accessible, Interoperable, Reusable data). Before using any corpus, it is recommended to find out about associated licenses and contact, if possible, the producers or managers of corpora to find out about any … Read more