Skip to Main content Skip to Navigation
Journal articles

Traitement de données issues d’un corpus écrit multilingue. Approche agile pour l’analyse du discours eurorégional

Abstract : The article presents some aspects of the model adapted to a corpus of around 600 texts (around 500 000 words) relative to the Euroregions. Complex and heterogeneous in several respects (technical, linguistic, editorial, generic, enunciative), the corpus raises the major challenge of the apprehension of multilingual data (French, Italian, Spanish, English, German, Dutch). Its handling required a suitable reflection and modeling process which we call "agile" because of its flexible and iterative character. The analysis platform can provide useful results for subsequent qualitative analysis of Euroregional discourse. It combines a proven part-of-speech tagger software (TreeTagger) with Perl modules and SQLite database developed to optimize simultaneous multilingual queries and automatic export of the results. The features related to the location of contextualized words and of co-occurrences, the collection of own names and detection of repeated segments serve as guides to express the needs of research, problems and proposed solutions. The analysis of the repeated expressions of decision and responsability in the corpus will illustrate the subject.
Complete list of metadatas

https://halshs.archives-ouvertes.fr/halshs-02168776
Contributor : Marie-Hélène Hermand <>
Submitted on : Saturday, June 29, 2019 - 11:04:57 AM
Last modification on : Thursday, June 4, 2020 - 10:24:09 AM

Links full text

Identifiers

Collections

Citation

Marie-Hélène Hermand, Emmanuel Thouraud. Traitement de données issues d’un corpus écrit multilingue. Approche agile pour l’analyse du discours eurorégional. SHS Web of Conferences, EDP Sciences, 2015, 20, pp.01009. ⟨10.1051/shsconf/20152001009⟩. ⟨halshs-02168776⟩

Share

Metrics

Record views

63