Un concordancier multi-niveaux et multimédia pour des corpus oraux

Abstract : Concordances have always played an important role in the analysis of language corpora, for studies in humanities, literature, linguistics, translation and language teaching. However, very few of the available systems support multi-level queries against a richly-annotated, sound-aligned spoken corpus. The rapid growth in the development of spoken corpora, particularly for French, increases the need for scalable, high-performance solutions. We present the preliminary results of our project to develop a multi-level multimedia concordancer for spoken language corpora. We test our prototype on the PFC corpus of spoken French (1.5 million tokens, transcriptions aligned to the utterance level). Our tool allows researchers to query the corpus and produce concordances correlating several annotation levels (part-of-speech tags, lemmas, annotation of phonological phenomena such as the liaison and schwa, etc.) while allowing for multi-modal access to the associated sound recordings and other data.
Document type :
Conference papers
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-01078133
Contributor : Laboratoire Modyco <>
Submitted on : Friday, November 7, 2014 - 1:25:49 PM
Last modification on : Wednesday, July 4, 2018 - 11:14:05 PM
Long-term archiving on : Friday, April 14, 2017 - 2:37:19 PM

File

Barreca 2014 concordancier.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-01078133, version 1

Collections

Citation

Giulia Barreca, George Christodoulides. Un concordancier multi-niveaux et multimédia pour des corpus oraux. 21e Conférence sur le Traitement automatique des Langues Naturelles (TALN 2014), Jul 2014, Marseille, France. ⟨halshs-01078133⟩

Share

Metrics

Record views

135

Files downloads

171