In this wiki, we use the term “annotation”, in the broad sense, to designate documentation of sub-parts of recordings (sentences, words, speech order…) as opposed to the term “metadata”, which designates the documentation of a recording as a whole.
In the strict sense, the term “annotation”, which is the encoding of different kinds of information (gloss, gestures, waypoints, morpho-syntactic analyses,…) as opposed to “transcription, which refers to phonetic or orthographic encoding of the produced speech.
-Stand-off annotation: Recommended for encoding annotations seperately from the primary data. Generally, each kind of annotation (prosodic, morphological syntactic, prosodic, etc.) is encoded in a specific file. The relation between the data of different kinds is indicated by the alignment system. This practice allows for each kind of data to be worked on separately from the others.
POS-tagging: Syntactic annotation consists of associating each token with a set of information containing part of speech (N, V, Det, etc.) and a set of features (plural, masculine, etc.). Those features can be quite detailed and also contain sub-category information or semantic features. Other information such as the lemma, the phonetic form, the frequency, etc. can also be indicated.
–Syntactic annotation (parsing): syntactic annotation consists of adding information regarding the syntactic structure which represents the syntactic units and the relationships between them. It distinguishes between superficial and deep annotation. The first is a matter of identifying chunks, or sequences of tokens belonging to a single group, without overlapping or hierarchical structure. For example, the sequence Det+N constitutes a nominal chunk. Annotation of deep syntactic structure consists in associating actual syntactic structure to a phrase. Generally, annotations are in the form of constituants or of dependencies. Corpora annotated with syntax are generally called treebanks.
Other kinds of annotations are possible: prosody, named entities, conference chains, thematic roles, discourse relations between discourse units, lexical disambiguation, emotions, opinions, etc. See sign-language annotation.
Some tools: Nooj, Glozz, Analec, Le Trameur, The Sketch Engine”