Skip to Main content Skip to Navigation
Conference papers

Maninka Reference Corpus: A Presentation

Abstract : An annotated corpus of Guinean Maninka, Corpus Maninka de Référence (CMR), was published in April 2016. It includes two subcorpora: one contains texts originally written in Latin-based graphics (792,778 words), and the other one is composed of texts in N'ko alphabet (3,105,879 words). Both subcorpora are searchable in both Latin-based graphics and in N'ko. In the building CMR, the Daba software package (earlier developed for the Corpus Bambara de Référence) has been used. As the search tool, NoSketchEngine has been used, it was adapted to the right-to-left direction of the N'ko writing. All texts in N'ko were obtained in electronic format, most of them were converted from pre-Unicode fonts. The morphological annotation is based on the Malidaba electronic dictionary which is in an intermediary stage of compillation; much efforts is needed to bring it to a minimally acceptable state.
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download
Contributor : Valentin Vydrin Connect in order to contact the contributor
Submitted on : Thursday, September 1, 2016 - 6:45:59 PM
Last modification on : Monday, November 22, 2021 - 10:33:43 AM
Long-term archiving on: : Friday, December 2, 2016 - 9:11:23 PM


Files produced by the author(s)


  • HAL Id : halshs-01358144, version 1


Valentin Vydrin, Andrij Rovenchak, Kirill Maslinsky. Maninka Reference Corpus: A Presentation. TALAf 2016 : Traitement automatique des langues africaines (écrit et parole). Atelier JEP-TALN-RECITAL 2016 - Paris le , Jul 2016, Paris, France. ⟨halshs-01358144⟩



Record views


Files downloads