Building an Open Morphological Lexicon and Lemmatizing Old French Texts with the TXM Platform

Abstract : This paper presents an experience of lemmatizing Medieval French texts (9th – 15th centuries) with the TXM platform (http://textometrie.org). The project uses available lexical resources to compile an open morphological lexicon of Medieval French (FROLEX), which is used in its turn to perform automatic lemmatization. At the final stage, the lemmas are verified and corrected by a human expert. The methodological solutions proposed and the tools for managing lexicons and applying lemmatization developed for TXM may be used for processing other languages, especially those with high variation in spelling and word segmentation practices.
Type de document :
Communication dans un congrès
Corpus linguistics - 2017, Jun 2017, St-Pétersbourg, Russia. pp.48-52, 2017, Proceedings of the international conference "Corpus linguistics - 2017". 〈https://events.spbu.ru/events/anons/corpora-2017〉
Liste complète des métadonnées

Littérature citée [6 références]  Voir  Masquer  Télécharger

https://halshs.archives-ouvertes.fr/halshs-01591122
Contributeur : Alexei Lavrentiev <>
Soumis le : mercredi 20 septembre 2017 - 19:10:49
Dernière modification le : jeudi 7 février 2019 - 15:27:22

Fichier

Lavrentiev-Heiden-Decorde.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : halshs-01591122, version 1

Citation

Alexei Lavrentiev, Serge Heiden, Matthieu Decorde. Building an Open Morphological Lexicon and Lemmatizing Old French Texts with the TXM Platform. Corpus linguistics - 2017, Jun 2017, St-Pétersbourg, Russia. pp.48-52, 2017, Proceedings of the international conference "Corpus linguistics - 2017". 〈https://events.spbu.ru/events/anons/corpora-2017〉. 〈halshs-01591122〉

Partager

Métriques

Consultations de la notice

95

Téléchargements de fichiers

106