Skip to Main content Skip to Navigation
Conference papers

Deux corpus audio transcrits de langues rares (japhug et na) normalisés en vue d'expériences en traitement du signal

Abstract : Two audio corpora of minority languages of China (Japhug and Na), with transcriptions, are proposed as reference data sets for experiments in Natural Language Processing. The data, collected and transcribed in the course of immersion fieldwork, amount to a total of 1,907 minutes in Japhug and 209 minutes in Na. By making them available in an easily accessible and usable form, we hope to facilitate the development and deployment of state-of-the-art NLP tools for the full range of human languages. We present a tool for assembling datasets from the Pangloss Collection (an open archive) in a way that ensures full reproducibility of experiments conducted on these data.
Document type :
Conference papers
Complete list of metadata

https://halshs.archives-ouvertes.fr/halshs-03475436
Contributor : Alexis Michaud Connect in order to contact the contributor
Submitted on : Friday, December 10, 2021 - 9:48:02 PM
Last modification on : Friday, August 5, 2022 - 11:58:04 AM

Licence


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License

Identifiers

  • HAL Id : halshs-03475436, version 1

Citation

Benjamin Galliot, Guillaume Wisniewski, Séverine Guillaume, Laurent Besacier, Guillaume Jacques, et al.. Deux corpus audio transcrits de langues rares (japhug et na) normalisés en vue d'expériences en traitement du signal. Journées scientifiques du Groupement de recherche "Linguistique informatique, formelle et de terrain" (GDR LIFT), Dec 2021, Grenoble, France. ⟨halshs-03475436⟩

Share

Metrics

Record views

86

Files downloads

35