À la croisée des langues. Annotation et fouille de corpus plurilingues

Abstract : In the frame of a research programme on the study of language c ontact phenomena and of their role in linguistic change, there currently is an eff ort to collect plurilingual corpora, exhibiting a great variety of contact phenomena on a sample o f languages of various genetical and typological background. This has implied developing a s pecific document processing software for digital corpora with internal plurilingualis m, in order to represent, store, annotate, and visualize their linguistic data, and to build data minin g tools. Existing encoding standards have been extended to cope with such phenomena as speech segm ents "floating" between languages, occurring in plurilingual talk. In this article , we describe the structure that has been defined for the plurilingual corpora, and the background defi nition of plurilingual linguistic units that is used for statistical analysis in the corpora.
Document type :
Journal articles
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-01063067
Contributor : Isabelle Léglise <>
Submitted on : Thursday, September 11, 2014 - 11:56:34 AM
Last modification on : Thursday, March 21, 2019 - 2:17:28 PM
Long-term archiving on : Friday, December 12, 2014 - 10:24:31 AM

File

Croisee_des_langues.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-01063067, version 1

Citation

Pascal Vaillant, Isabelle Léglise. À la croisée des langues. Annotation et fouille de corpus plurilingues. Revue des Nouvelles Technologies de l'Information, Hermann, 2014, RNTI-SHS-2, pp.81-100. ⟨halshs-01063067⟩

Share

Metrics

Record views

466

Files downloads

559