The data used in corpus linguistics can be of different natures: written or oral data, but also videos, movement and eye-tracking captures, etc. The acquisition of data to build a corpus must be carefully prepared beforehand and the method used must be well defined and documented to ensure traceability. In particular, the question of required equipment (in case of recordings), necessary tools, and metadata to be associated with the collected data must be addressed.
More information on the CORLI website:
- The page Best practice in corpus building lists good and bad practices during corpus building
- Several training courses dedicated to corpus building, especially multimodal ones, have been organized at CORLI. They are listed here: Corpus building – Training courses and materials