The CoMeRe French CMC corpora and their modeling in TEI - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

The CoMeRe French CMC corpora and their modeling in TEI

Résumé

CoMeRe (acronym which in French stands for network mediated communication) is a national project involving researchers from 8 different research units to develop a repos-itory of CMC all modeled within the same extension of the TEI (Chanier et al. 2014). The project was carried out from 2013 to 2015 with the support of Corpus-Ecrits (http://corpusecrits.huma-num.fr/, a national research consortium on written corpora) and Ortolang (http://www.ortolang.fr, a national infrastructure for tools and corpora on French language),.Three key principles underlie CoMeRe: variety, openness and standards. “Variety” is one of our keywords since we have assembled interactions stemming from networks such as the Internet or telecommunications (mobile phones), as well as mono- and multimodal, and synchronous and asynchronous communications. The genres covered within CoMeRe include text or oral chats, email, discussion forums, blogs, tweets, audio-graphic conferencing systems (conference systems with text, audio, and iconic signs for communication), or even collaborative working/learning environments with verbal and nonverbal communication. “Openness” is our second keyword. The first set of 11 corpora has been released (http://hdl.handle.net/11403/comere) as open data on Ortolang. Our wish to release CoMeRe corpora as open data stems from the fact that, although studies on new CMC communication genres draw much attention, there is cur-rently no existing dataset with significant coverage to form the basis for systematic re-search."Standards" refers to two different aspects. Firstly, corpora have been structured and referred to in a uniform way. The TEI-IS is the model developed as an extension of the TEI in order to encompass the Interaction Space (IS) of CMC multimodal discourse. “Standards” also refers to the uniform basic level of automatic annotations, related to segmentation and part of speech (POS) tagging which is underway.

Domaines

Linguistique
201510-comere-irdrennes.pdf (7.5 Mo) Télécharger le fichier
201510-comere-irdrennes.pptx (4.76 Mo) Télécharger le fichier
teicmc4rennes.pdf (192.17 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Origine : Fichiers produits par l'(les) auteur(s)
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

halshs-01222979 , version 1 (02-11-2015)

Licence

Paternité

Identifiants

  • HAL Id : halshs-01222979 , version 1

Citer

Thierry Chanier, Céline Poudat, Ciara Wigham. The CoMeRe French CMC corpora and their modeling in TEI. ird-cmc-rennes: Social Media and CMC Corpora for the eHumanities., Oct 2015, Rennes, France. ⟨halshs-01222979⟩
182 Consultations
234 Téléchargements

Partager

Gmail Facebook X LinkedIn More