Phonemic transcription of low-resource tonal languages

Transcription of speech is an important part of language documentation, and yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes to 150 minutes. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.

Mots clés

Language documentation Automatic speech transcription Automatic speech recognition Automatic language processing Endangered languages Oral literature Sound archives Online databases Multimedia corpora Interdisciplinarity Open-source software Open access

Traitement automatique de la parole Transcription automatique Transcription phonémique Documentation linguistique

Domaines

Linguistique

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Communication dans un congrès
Titre	en Phonemic transcription of low-resource tonal languages
Résumé	en Transcription of speech is an important part of language documentation, and yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes to 150 minutes. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.
Auteur(s)	Oliver Adams ¹ , Trevor Cohn ¹ , Graham Neubig ² , Alexis Michaud ^{3, 4} 1 University of Melbourne ( 306322 ) - Parkville VIC 3010 - Australie 2 CMU - Carnegie Mellon University [Pittsburgh] ( 67135 ) - 5000 Forbes Ave, Pittsburgh, PA 15213 - États-Unis 3 LACITO - Langues et civilisations à tradition orale ( 406905 ) - 7, rue Guy Môquet, 94800, VILLEJUIF - France Université Sorbonne Nouvelle - Paris 3 UMR7107 ( 52995 ) ; Institut National des Langues et Civilisations Orientales UMR7107 ( 300064 ) ; Centre National de la Recherche Scientifique UMR7107 ( 441569 ) 4 LPP - LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 ( 986 ) - Université Sorbonne Nouvelle Maison de la Recherche 4, rue des Irlandais 75005 PARIS - France Université Sorbonne Nouvelle - Paris 3 ( 52995 ) ; Centre National de la Recherche Scientifique UMR7018 ( 441569 )
Licence	Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales
Langue du document	Anglais
Source	ISSN: 1834-7037
Vulgarisation	Non
Comité de lecture	Oui
Invité	Non
Audience	Internationale
Actes	Oui
Date de publication	2017
Titre de la collection	Australasian Language Technology Association Workshop 2017: Proceedings of the workshop
Page/Identifiant	53-60
Titre du congrès	Australasian Language Technology Association Workshop 2017
Date début congrès	2017-12-06
Date fin congrès	2017-12-08
Ville	Brisbane
Pays	Australie
Domaine(s)	Sciences de l'Homme et Société/Linguistique
Éditeur scientifique	Wong, Sze-Meng Jojo Haffari, Gholamreza
Voir aussi	http://lacito.vjf.cnrs.fr/pangloss/corpus/list_rsc_en.php?lg=Na
Projet(s) ANR	Corpus parallèles en langues himalayennes [En savoir plus] HimalCo - ANR-12-CORP-0006 Corpus - 2012 Université Sorbonne Paris Cité [En savoir plus] USPC - ANR-11-IDEX-0005 IDEX - 2011
Mots-clés	en Language documentation, Automatic speech transcription, Automatic speech recognition, Automatic language processing, Endangered languages, Oral literature, Sound archives, Online databases, Multimedia corpora, Interdisciplinarity, Open-source software, Open access fr Traitement automatique de la parole, Transcription automatique, Transcription phonémique, Documentation linguistique

Fichier principal

Adams_et_al2017_PhonemicTranscription.pdf ( 1.02 Mo )

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Alexis Michaud : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-01656683

Soumis le : mardi 5 décembre 2017 à 21:33:39

Dernière modification le : mardi 2 avril 2024 à 15:48:04

Dates et versions

halshs-01656683, version 1 (05-12-2017)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales - CC BY 4.0

Identifiants

HAL Id : halshs-01656683 , version 1

Citer

Oliver Adams, Trevor Cohn, Graham Neubig, Alexis Michaud. Phonemic transcription of low-resource tonal languages. Australasian Language Technology Association Workshop 2017, Dec 2017, Brisbane, Australia. pp.53-60. ⟨halshs-01656683⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

CNRS UNIV-PARIS3 INALCO LPP LACITO CAMPUS-AAR AAI USPC ASIES_ET_PACIFIQUE ANR

566 Consultations

421 Téléchargements

Dernière date de mise à jour le 07/04/2024