Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Article Dans Une Revue Journal of Data Mining and Digital Humanities Année : 2021

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Résumé

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Fichier principal
Vignette du fichier
Corpus_models_Classical_French_v2.pdf (687.4 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

halshs-02591388 , version 1 (15-05-2020)
halshs-02591388 , version 2 (05-02-2021)

Licence

Paternité - Partage selon les Conditions Initiales

Identifiants

Citer

Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero. Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre. Journal of Data Mining and Digital Humanities, 2021, ⟨10.46298/jdmdh.6485⟩. ⟨halshs-02591388v2⟩
440 Consultations
908 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More