Building an Open Morphological Lexicon and Lemmatizing Old French Texts with the TXM Platform

Abstract : This paper presents an experience of lemmatizing Medieval French texts (9th – 15th centuries) with the TXM platform (http://textometrie.org). The project uses available lexical resources to compile an open morphological lexicon of Medieval French (FROLEX), which is used in its turn to perform automatic lemmatization. At the final stage, the lemmas are verified and corrected by a human expert. The methodological solutions proposed and the tools for managing lexicons and applying lemmatization developed for TXM may be used for processing other languages, especially those with high variation in spelling and word segmentation practices.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [6 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-01591122
Contributor : Alexei Lavrentiev <>
Submitted on : Wednesday, September 20, 2017 - 7:10:49 PM
Last modification on : Thursday, February 7, 2019 - 3:27:22 PM

File

Lavrentiev-Heiden-Decorde.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : halshs-01591122, version 1

Citation

Alexei Lavrentiev, Serge Heiden, Matthieu Decorde. Building an Open Morphological Lexicon and Lemmatizing Old French Texts with the TXM Platform. Corpus linguistics - 2017, St-Petersburg State University; Institute for Linguistic Studies (RAS); Herzen State Pedagogical University of Russia, Jun 2017, St-Pétersbourg, Russia. pp.48-52. ⟨halshs-01591122⟩

Share

Metrics

Record views

114

Files downloads

121