Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Abstract : This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Complete list of metadata

https://halshs.archives-ouvertes.fr/halshs-02591388
Contributor : Jean-Baptiste Camps <>
Submitted on : Friday, February 5, 2021 - 4:26:53 PM
Last modification on : Wednesday, July 7, 2021 - 4:50:02 PM

File

Corpus_models_Classical_French...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License

Identifiers

Collections

Citation

Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre. 2020. ⟨halshs-02591388v2⟩

Share

Metrics

Record views

212

Files downloads

336