Skip to Main content Skip to Navigation
Journal articles

Corpus complexes et standards : un retour sur le projet CoMeRe

Abstract : The aim of this contribution is to review the national research project CoMeRe (Communication Médiée par les Réseaux - Networked-Mediated Communication) and, in particular, focus on the the complexity of the corpus it developed, structured, and disseminated. The CoMeRe corpus is a reference corpus for computer-mediated communication (CMC) in French comprising fourteen sub-corpora. Fourteen researchers from eight different laboratories were involved in the project and three key words guided their collaborations: variety, standards, and open access. The CoMeRe corpus is composed of a wide range of heterogeneous CMC genres (emails, text chat, SMS, Internet discussion forums, blogs, tweets, Wikipedia discussions, interactions from synthetic worlds). In the first section of the article, we underline their main characteristics of the different CMC genres and highlight their similarities and differences. We then describe the choices made to support corpus interoperability: the fourteen sub-corpora were structured in a standardized manner in accordance with the Interaction Space model developed within the project (Chanier & Jin, 2013) and, in collaboration with European partners, following guidelines for standardizing CMC corpora in TEI (Text Encoding Initiative, 2019). The CoMeRe corpus was released in an open-access format so as to encourage future use by the scientific community. In the article’s conclusion, we underline the different implications of the corpus’ dissemination.
Document type :
Journal articles
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Ciara R. Wigham <>
Submitted on : Monday, February 3, 2020 - 4:42:07 PM
Last modification on : Wednesday, February 24, 2021 - 1:32:01 PM
Long-term archiving on: : Monday, May 4, 2020 - 12:17:02 PM


Files produced by the author(s)


  • HAL Id : halshs-02460613, version 1


Ciara R. Wigham, Céline Poudat. Corpus complexes et standards : un retour sur le projet CoMeRe. Corpus, Bases, Corpus, Langage - UMR 7320, 2020. ⟨halshs-02460613⟩



Record views


Files downloads