Parsing Poorly Standardized Language Dependency on Old French

Abstract : This paper presents results of dependency parsing of Old French, a language which is poorly standardized at the lexical level, and which displays a relatively free word order. The work is carried out on five distinct sample texts extracted from the dependency treebank Syntactic Reference Corpus of Medieval French (SRCMF). Following Achim Stein's previous work, we have trained the Mate parser on each sub-corpus and cross-validated the results. We show that the parsing efficiency is diminished by the greater lexical variation of Old French compared to parse results on modern French. In order to improve the result of the POS tagging step in the parsing process, we applied a pre-treatment to the data, comparing two distinct strategies: one using a slightly post-treated version of the TreeTagger trained on Old French by Stein, and a CRF trained on the texts, enriched with external resources. The CRF version outperforms every other approach.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01250959
Contributor : Sophie Prevost <>
Submitted on : Tuesday, January 5, 2016 - 3:55:06 PM
Last modification on : Tuesday, July 23, 2019 - 4:16:13 PM
Long-term archiving on : Thursday, April 7, 2016 - 3:24:01 PM

File

guibon_al_TLT14.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01250959, version 1

Citation

Gaël Guibon, Isabelle Tellier, Mathieu Constant, Sophie Prévost, Kim Gerdes. Parsing Poorly Standardized Language Dependency on Old French. Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13), Dec 2014, Tübingen, Germany. pp.51-61. ⟨hal-01250959v1⟩

Share

Metrics

Record views

10

Files downloads

4