What are the main steps of an annotation campaign?

If you want to annotate a corpus, here are the main steps you should follow:

  • Check that your corpus is submitted in an editable, open and non-proprietary format such as .txt, .xml or .json. Documents in .doc, .pdf, .docx, etc. format should be prepared for annotation
  • Establish an annotation scheme: define objects to be annotated (units, relations, complex structures), types of linguistic units involved (characters, words, statements, paragraphs, undefined units), characteristics to be associated with the annotated objects
  • Choose an annotation software (if possible, after having tested several)
  • Write the annotation guide
  • Test the guide with several people on the same text
  • Compare annotations to stabilize the final version of the guide
  • Select and train annotators (it is a good idea to propose a first annotation that can be compared with a reference version, for example the text used to stabilize the guide)
  • Annotate
  • Check annotation quality, in particular by calculating inter-annotator agreement
  • If possible, provide an adjudicated version (reference version in which disagreements will have been resolved)
  • Describe the collected annotations
  • If possible, add new examples (including examples of uncertainties and disagreements) and annotators’ testimonies to the annotation guide