GP3 – Multilingual

Group project 3: Multilingual and Plurilingual corpora

Team leaders: Antonio Balvet (Université de Lille), Natalie Kübler (Université Paris Diderot) and Maria Zimina (Université Paris Diderot)

This work group puts researchers working on multilingual and plurilingual corpora, whether written or oral, of traditionally written or oral languages, into contact with each other. Its goal is to exchange methods and tools that are used in other domains as well as theoretical aspects central to each tradition. We are going, in particular, to reflect on on the following points together:

  • On the building of written and oral corpora for major languages vs. the building of oral corpora for less described languages: what are the tools, what are the annotators, what are the research priorities? ·
  • Quantitative exploitation of large corpora vs. quantitative exploitation of smaller corpora of lesser studied languages: what statistical models to use, what are the theoretical questions and what are the methods?

The group’ objectives are the organization of classes on the use of the specific tools used or adapted to the use of multilingual or plurilingual corpora as well as courses on possible statistical treatments. It organizes seminars that bring together researchers working on multilingual and plurilingual written and oral corpora. Finally, the group encourages the promotion of existing corpora via specific annotations for languages in contact.

2017 Events

A day of scientific conferences was held on Septemer 15, 2017 on the CNRS campus at Villejuif (org. Evangelia Adamou, Antonio Balvet, Natalie Kübler et Maria Zimina). This event brought together members of the scientific community who are interested in the problematic of creating and analyzing multilingual and plurilingual corpora, both written and oral.

2018 Events

A day of scientific conferences was held on the November 30, 2018 at the Université Paris Diderot, called “Cross-lingual analysis and the annotation of parallel and comprable multilingual corpora: current and future trends” (org. Natalie Kübler, Maria Zimina, Evangelia Adamou et Antonio Balvet). The principle goal of the conference is to bring together researchers and professionals coming from other theoretical schools and different domains.

  • The program of the conference was as follows:
    • Invited speakers Vesna Lušicky (University of Vienna), Tanja Wissik (Austrian Academy of Sciences). Overview of the CLARIN multilingual resources.
  • Peer-reviewed abstracts:
    • Natalia Levshina (Leipzig University). Multilingual parallel corpora and semantic maps: traditional and new approaches.
    • Evangelia Adamou (CNRS). Endangered languages on a scale of language mixing.
    • Efstathia Soroli (Université de Lille et CNRS) et Cathy Cohen (Université Lyon 1and CNRS). Bilingual Discourse Analysis (BilDA): Research methods in second language acquisition and bilingualism – a manual for transcription, coding and analysis.
    • Yuming Zhai (CNRS). Construction d’un Corpus Multilingue Annoté en Relations de Traduction.
    • Monika Chwalczuk (Paris Diderot). Saisir le multimodal : Les défis d’annotation d’un corpus d’interactions interprétées dans les services publics.