Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Abstract : This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-02591388
Contributor : Jean-Baptiste Camps <>
Submitted on : Friday, May 15, 2020 - 3:04:29 PM
Last modification on : Tuesday, July 21, 2020 - 3:25:01 AM

Files

corpusAndModels.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License

Identifiers

Collections

Citation

Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre. 2020. ⟨halshs-02591388⟩

Share

Metrics

Record views

67

Files downloads

65