HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Abstract : This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Complete list of metadata

https://halshs.archives-ouvertes.fr/halshs-02591388
Contributor : Jean-Baptiste Camps Connect in order to contact the contributor
Submitted on : Friday, February 5, 2021 - 4:26:53 PM
Last modification on : Friday, April 1, 2022 - 3:56:09 AM

File

Corpus_models_Classical_French...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License

Identifiers

Collections

Citation

Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre. Journal of Data Mining and Digital Humanities, Episciences.org, 2021, ⟨10.46298/jdmdh.6485⟩. ⟨halshs-02591388v2⟩

Share

Metrics

Record views

354

Files downloads

528