Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages

Thi-Ngoc-Diep Do; Alexis Michaud; Eric Castelli

Communication dans un congrès Année : 2014

Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages

(1) , (2) , (1)

1
2

Thi-Ngoc-Diep Do

Fonction : Auteur correspondant
PersonId : 955335

Speech Communication

Alexis Michaud

Fonction : Auteur
PersonId : 419
IdHAL : alexis-michaud
ORCID : 0000-0003-1165-2680
IdRef : 095131507

Speech Communication

Eric Castelli

Fonction : Auteur
PersonId : 750232
IdHAL : eric-castelli
ORCID : 0000-0003-2978-2619
IdRef : 068256256

Speech Communication

Résumé

Automatic speech processing technologies hold great potential to facilitate the urgent task of documenting the world's languages. The present research aims to explore the application of speech recognition tools to a little-documented language, with a view to facilitating processes of annotation, transcription and linguistic analysis. The target language is Yongning Na (a.k.a. Mosuo), an unwritten Sino-Tibetan language with less than 50,000 speakers. An acoustic model of Na was built using CMU Sphinx. In addition to this 'light' model, trained on a small data set (only 4 hours of speech from 1 speaker), 'heavyweight' models from five national languages (English, French, Chinese, Vietnamese and Khmer) were also applied to the same data. Preliminary results are reported, and perspectives for the long road ahead are outlined.

Mots clés

Acoustic models automatic speech recognition (ASR) multilingual modelling under-resourced languages endangered languages Yongning Na Naish languages language portability statistical language modeling crosslingual acoustic modelling and adaptation

Domaines

Linguistique Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Communication dans un congrès
Titre	en Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages
Résumé	en Automatic speech processing technologies hold great potential to facilitate the urgent task of documenting the world's languages. The present research aims to explore the application of speech recognition tools to a little-documented language, with a view to facilitating processes of annotation, transcription and linguistic analysis. The target language is Yongning Na (a.k.a. Mosuo), an unwritten Sino-Tibetan language with less than 50,000 speakers. An acoustic model of Na was built using CMU Sphinx. In addition to this 'light' model, trained on a small data set (only 4 hours of speech from 1 speaker), 'heavyweight' models from five national languages (English, French, Chinese, Vietnamese and Khmer) were also applied to the same data. Preliminary results are reported, and perspectives for the long road ahead are outlined.
Auteur(s)	Thi-Ngoc-Diep Do ¹ , Alexis Michaud ² , Eric Castelli ¹ 1 Speech Communication ( 397283 ) - - Viêt Nam International Research Institute MICA ( 121592 ) ; Institut National Polytechnique de Grenoble ( 300275 ) ; Hanoi University of Science and Technology ( 321656 ) ; Centre National de la Recherche Scientifique UMI2954 ( 441569 ) 2 Speech Communication ( 393821 ) - Viêt Nam International Research Institute MICA ( 121592 ) ; Institut National Polytechnique de Grenoble ( 300275 ) ; Hanoi University of Science and Technology ( 321656 ) ; Centre National de la Recherche Scientifique UMI2954 ( 441569 )
Vulgarisation	Non
Comité de lecture	Oui
Actes	Oui
Invité	Non
Langue du document	Anglais
Titre de l'ouvrage	Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014)
Audience	Internationale
Date de publication	2014
Page/Identifiant	153-160
Titre du congrès	4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014)
Date début congrès	2014-05-14
Date fin congrès	2014-05-16
Ville	St Petersburg
Pays	Russie
Domaine(s)	Sciences de l'Homme et Société/Linguistique Sciences de l'ingénieur [physics]/Traitement du signal et de l'image [eess.SP] Informatique [cs]/Traitement du signal et de l'image [eess.SP]
Projet(s) ANR	Corpus parallèles en langues himalayennes [En savoir plus] HimalCo - ANR-12-CORP-0006 Corpus - 2012
Financement	ANR-10-LABX-0083 LabEx EFL - Investissements d'Avenir
Mots-clés	en Acoustic models, automatic speech recognition (ASR), multilingual modelling, under-resourced languages, endangered languages, Yongning Na, Naish languages, language portability, statistical language modeling, crosslingual acoustic modelling and adaptation

Fichier principal

SLTU2014_Do_Michaud_Castelli_FINAL.pdf ( 325.17 Ko )

Origine : Fichiers produits par l'(les) auteur(s)

Alexis Michaud : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-00980431

Soumis le : dimanche 25 mai 2014 à 16:04:32

Dernière modification le : jeudi 4 avril 2024 à 21:13:44

Archivage à long terme le : mardi 11 avril 2017 à 01:38:18

Dates et versions

halshs-00980431, version 1 (18-04-2014)

halshs-00980431, version 2 (25-05-2014)

Identifiants

HAL Id : halshs-00980431 , version 2

Citer

Thi-Ngoc-Diep Do, Alexis Michaud, Eric Castelli. Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014), May 2014, St Petersburg, Russia. pp.153-160. ⟨halshs-00980431v2⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

UGA CNRS GENCI ASIES_ET_PACIFIQUE ANR

3130 Consultations

352 Téléchargements

Dernière date de mise à jour le 20/04/2024

Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager