Parsing Poorly Standardized Language Dependency on Old French - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Parsing Poorly Standardized Language Dependency on Old French

Résumé

This paper presents results of dependency parsing of Old French, a language which is poorly standardized at the lexical level, and which displays a relatively free word order. The work is carried out on five distinct sample texts extracted from the dependency treebank Syntactic Reference Corpus of Medieval French (SRCMF). Following Achim Stein's previous work, we have trained the Mate parser on each sub-corpus and cross-validated the results. We show that the parsing efficiency is diminished by the greater lexical variation of Old French compared to parse results on modern French. In order to improve the result of the POS tagging step in the parsing process, we applied a pre-treatment to the data, comparing two distinct strategies: one using a slightly post-treated version of the TreeTagger trained on Old French by Stein, and a CRF trained on the texts, enriched with external resources. The CRF version outperforms every other approach.
Fichier principal
Vignette du fichier
guibon_al_TLT14.pdf (106.71 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01250959 , version 1 (05-01-2016)
hal-01250959 , version 2 (06-01-2016)

Identifiants

  • HAL Id : hal-01250959 , version 1

Citer

Gaël Guibon, Isabelle Tellier, Mathieu Constant, Sophie Prévost, Kim Gerdes. Parsing Poorly Standardized Language Dependency on Old French. Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13), Dec 2014, Tübingen, Germany. pp.51-61. ⟨hal-01250959v1⟩
277 Consultations
177 Téléchargements

Partager

Gmail Facebook X LinkedIn More