Skip to Main content Skip to Navigation
Journal articles

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Abstract : This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Complete list of metadata

https://halshs.archives-ouvertes.fr/halshs-02591388
Contributor : Jean-Baptiste Camps Connect in order to contact the contributor
Submitted on : Friday, February 5, 2021 - 4:26:53 PM
Last modification on : Thursday, September 23, 2021 - 5:56:54 PM

File

Corpus_models_Classical_French...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License

Identifiers

Collections

`

Citation

Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre. Journal of Data Mining and Digital Humanities, Episciences.org, 2021, ⟨10.46298/jdmdh.6485⟩. ⟨halshs-02591388v2⟩

Share

Metrics

Record views

236

Files downloads

455