What are the main steps of an annotation campaign?

If you want to annotate a corpus, here are the main steps you should follow: Check that your corpus is submitted in an editable, open and non-proprietary format such as .txt, .xml or .json. Documents in .doc, .pdf, .docx, etc. format should be prepared for annotation Establish an annotation scheme: define objects to be annotated … Read more

What tools are available for corpus annotation?

Many tools dedicated to corpus annotation are listed on the software inventory page; to get the complete list, you can filter tools by type (Type=Annotation). Some have been presented during training sessions offered by CORLI: ELAN, a software for creating complex annotations on video and audio resources Glozz, an annotation and exploration environment for textual … Read more

How to write an annotation guide?

Any annotation project should be accompanied by the writing of an annotation guide detailing decisions made regarding corpus annotation, linguistic objects to be identified by annotators, categories that can be assigned to them, etc. To write an annotation guide, it may be useful to consult other guides written in the state of the art. This … Read more

How to assess annotations quality?

To check the quality of annotations, it is essential to evaluate the inter-annotator agreement. To do this, we compare the annotations of multiple annotators to whom we have submitted the same data. The most common measure used to evaluate inter-annotator agreement is Cohen’s Kappa.

What is corpus annotation?

Annotating a corpus means adding one or more layers of linguistic interpretation to raw data. Annotations added can be of very diverse natures: they can be morpho-syntactic categories, semantic or discursive annotations, but also, in the case of oral or multi-modal corpora, information on prosody, gestures, etc. Annotations are performed during annotation campaigns by human … Read more

What tools are available for oral or multimodal corpus annotation?

Various tools dedicated to oral or multimodal corpora annotation are listed in the software inventory section ; to get the complete list, you can filter the tools by type (Type=Analysis) and by type of data (Data=Audio/Video). Some have been demonstrated during training sessions organized by CORLI, including : ELAN, a software for creating complex annotations on … Read more

How to use the INCEpTION platform?

As part of the Annotation project (CORLI 2022-2025) and a student project within the LITL Masters degree in Linguistics (Toulouse, France), we have created a set of files to get started and annotate with the INCEpTION platform. The student project’s objective was to participate in the design of a high-level annotation platform with active learning … Read more