CORLI Training sessions – Analysis of textual data, corpora manipulation, data extraction, exploration

Date(s) - 16/05/2023 - 17/05/2023
9h30 - 17h00

Université Paris Cité




📚 Tuesday, May 16

  • 9:30 am to 12:30 pm

Title: TXM (beginner)

Speaker(s): Achille FALAISE, Loïc LIÉGEOIS

Where ? Room 43XC, Halle aux Farines building

Content: TBA

  • 2pm to 5pm

Title: TXM (Advanced): non-statistical features

Speaker(s): Bénédicte PINCEMIN

Where ? Room 436C, Halle aux Farines building

Content: This session will focus on basic features (no specific mathematical knowledge are required) such as concordance, index, progression as well as already known features participants may have already used. Those features implementation allows for advanced corpus exploration methods, less known though useful in practical use.

  • 2pm to 5pm

Title: INCEpTION: a collaborative plateform for linguistics annotation

Speaker(s): Lydia-May HO-DAC, Céline POUDAT

Where ? Room 432C, Halle aux Farines building

Content: Annotating a corpus means adding one or more layers of linguistic interpretation to raw data. Added annotations can be morpho-syntactic, semantic or discursive, as well as prosodic or gestural, etc – in the case of oral or multi-modal corpora.  

Annotating is done during annotation campaigns, by human annotators – more or less expert – relying on annotation guides and tools.

CORLI consortium has chosen the INCEpTION platform as one of these tools to provide the community with a documented platform for collaborative annotation.

INCEpTION offers features to conduct multi-layered embedded and disembarked annotation (called projects) facilitating collaborative annotation. INCEpTION collaborative annotation includes:

  1. management of annotator cohorts (role assignment),
  2. text assignment,
  3. adjudication and measurement of inter-annotator agreement.

No specific prerequisites are necessary. Participants can bring their own computer or use the available computers.

📚 Wednesday, May 17

  • 9:30 am to 12:30 pm

Title: Extracting data from corpora – XSLT

Speaker(s): Alexey LAVRENTEV

Where ? Room 43XC, Halle aux Farines building

Content: This workshop will focus on XPath and XSLT languages for XML data processing. We’ll use a sample corpus (TEI) to practice on: tokenization, index creation for proper names, export of annotations in tabular format (CONLL).

Prerequisites: Oxygen XML Editor (alternatively XSL Transform) and TXM software.

  • 9:30 am to 12:30 pm

Title: IRaMuTeQ for beginners: an introduction

Speaker(s): Lucie LOUBERE

Where ? Room 43XC, Halle aux Farines building

Content: The training session will include a demonstration of possible type of corpora analyzed with Iramuteq as well as textual data analyses offered by the software. Details of the latter will not be presented – we will rather focus on various features useful for research on textual data.

Prerequisites: none – installing IRaMuTeQ  is not mandatory ; we’ll still provide participants with an installation guide as well as a sample corpus. 

  • 2pm to 5pm

Title: Introduction to statistics with R

Speaker(s): Olivier CROUZET

Where ? Room 43XC, Halle aux Farines building

Content: TBA


Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab


Université Paris Cité – Campus Grands Moulins
8 Place Paul Ricoeur, 75013 Paris
Halle aux farines building

Training sessions will take place in room 432C & 436C

Google Maps link

Please register here or scan the QR code below !


Oxygen XML alternatives:

  • a free trial version is available on the website (requires an email address)
  • If you’d rather not use the trial version, can be used during the training session