Skip to Main content Skip to Navigation
Conference papers

The CoMeRe French CMC corpora and their modeling in TEI

Abstract : CoMeRe (acronym which in French stands for network mediated communication) is a national project involving researchers from 8 different research units to develop a repos-itory of CMC all modeled within the same extension of the TEI (Chanier et al. 2014). The project was carried out from 2013 to 2015 with the support of Corpus-Ecrits (, a national research consortium on written corpora) and Ortolang (, a national infrastructure for tools and corpora on French language),.Three key principles underlie CoMeRe: variety, openness and standards. “Variety” is one of our keywords since we have assembled interactions stemming from networks such as the Internet or telecommunications (mobile phones), as well as mono- and multimodal, and synchronous and asynchronous communications. The genres covered within CoMeRe include text or oral chats, email, discussion forums, blogs, tweets, audio-graphic conferencing systems (conference systems with text, audio, and iconic signs for communication), or even collaborative working/learning environments with verbal and nonverbal communication. “Openness” is our second keyword. The first set of 11 corpora has been released ( as open data on Ortolang. Our wish to release CoMeRe corpora as open data stems from the fact that, although studies on new CMC communication genres draw much attention, there is cur-rently no existing dataset with significant coverage to form the basis for systematic re-search."Standards" refers to two different aspects. Firstly, corpora have been structured and referred to in a uniform way. The TEI-IS is the model developed as an extension of the TEI in order to encompass the Interaction Space (IS) of CMC multimodal discourse. “Standards” also refers to the uniform basic level of automatic annotations, related to segmentation and part of speech (POS) tagging which is underway.
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download
Contributor : Thierry Chanier <>
Submitted on : Monday, November 2, 2015 - 3:55:23 PM
Last modification on : Wednesday, October 14, 2020 - 4:22:53 AM


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : halshs-01222979, version 1


Thierry Chanier, Céline Poudat, Ciara Wigham. The CoMeRe French CMC corpora and their modeling in TEI. ird-cmc-rennes: Social Media and CMC Corpora for the eHumanities., Oct 2015, Rennes, France. ⟨halshs-01222979⟩



Record views


Files downloads