Skip to Main content Skip to Navigation
Conference papers

Un concordancier multi-niveaux et multimédia pour des corpus oraux

Abstract : Concordances have always played an important role in the analysis of language corpora, for studies in humanities, literature, linguistics, translation and language teaching. However, very few of the available systems support multi-level queries against a richly-annotated, sound-aligned spoken corpus. The rapid growth in the development of spoken corpora, particularly for French, increases the need for scalable, high-performance solutions. We present the preliminary results of our project to develop a multi-level multimedia concordancer for spoken language corpora. We test our prototype on the PFC corpus of spoken French (1.5 million tokens, transcriptions aligned to the utterance level). Our tool allows researchers to query the corpus and produce concordances correlating several annotation levels (part-of-speech tags, lemmas, annotation of phonological phenomena such as the liaison and schwa, etc.) while allowing for multi-modal access to the associated sound recordings and other data.
Document type :
Conference papers
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download
Contributor : Laboratoire Modyco <>
Submitted on : Friday, November 7, 2014 - 1:25:49 PM
Last modification on : Tuesday, March 2, 2021 - 10:06:58 AM
Long-term archiving on: : Friday, April 14, 2017 - 2:37:19 PM


Barreca 2014 concordancier.pdf
Files produced by the author(s)


  • HAL Id : halshs-01078133, version 1


Giulia Barreca, George Christodoulides. Un concordancier multi-niveaux et multimédia pour des corpus oraux. 21e Conférence sur le Traitement automatique des Langues Naturelles (TALN 2014), Jul 2014, Marseille, France. ⟨halshs-01078133⟩



Record views


Files downloads