Phonemic transcription of low-resource languages: To what extent can preprocessing be automated? - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?

Résumé

Automatic Speech Recognition for low-resource languages has been an active field of research for more than a decade. It holds promise for facilitating the urgent task of documenting the world's dwindling linguistic diversity. Various methodological hurdles are encountered in the course of this exciting development, however. A well-identified difficulty is that data preprocessing is not at all trivial. The tests reported here (on Yongning Na and other languages from the Pangloss Collection, an open archive of endangered languages) explore some possibilities for automating the process of data preprocessing: assessing to what extent it is possible to bypass the involvement of language experts for menial tasks of data preparation for Natural Language Processing (NLP) purposes. What is at stake is the accessibility of language archive data for a range of NLP tasks and beyond.
Fichier principal
Vignette du fichier
PersephonePangloss_SLTU21March2020.pdf (1.19 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02513914 , version 1 (26-03-2020)
hal-02513914 , version 2 (18-04-2020)
hal-02513914 , version 3 (27-05-2020)

Identifiants

  • HAL Id : hal-02513914 , version 1

Citer

Guillaume Wisniewski, Alexis Michaud, Séverine Guillaume. Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?. 2020. ⟨hal-02513914v1⟩
647 Consultations
316 Téléchargements

Partager

Gmail Facebook X LinkedIn More