What are the legal and ethical issues involved in collecting data and making it available in a corpus?

Sharing resources is essential in an open science approach as promoted by CORLI. When data collected to build a corpus comes from speakers, thus from individuals, personal information and intellectual property should be protected. In some cases, relevant data for linguistic analysis are directly identifying (information on the speaker, voice, image…) or even sensitive (opinions, origins, health, etc.). There is therefore a balance to be found to allow the dissemination of corpora in compliance with legislation and ethics. The objective of the QuECJ network-group is to inform and accompany the community on these issues.

On this page, you will find various documents concerning best legal practices.