Une expérience d'attribution d'auteur. Le corpus Saint-Jean.

Abstract : With the collaboration of J. Savoy, a corpus has been compiled in order to test the methods of authorship attribution (200 excerpts drawn out of 68 novels by 31 authors). The vocabulary differences between texts are measured by the intertextual distance. With the help of the nearest neighbour method, all excerpts are correctly attributed but this attribution requires that every author should have at least two texts in the corpus. In the absence of this condition, the smallest distances are used (associated with a confidence interval). This method attributes, without error, 8 excerpts out of 10. Two classifications (hierarchical and tree-classification) lead to the same results. A standardized scale of the intertextual distance makes it possible to attribute a text in a simple and safe way without having to repeat the whole procedure.
Complete list of metadatas

https://halshs.archives-ouvertes.fr/halshs-01627373
Contributor : Dominique Labbé <>
Submitted on : Wednesday, November 1, 2017 - 12:10:40 PM
Last modification on : Tuesday, March 13, 2018 - 4:40:06 PM
Long-term archiving on : Friday, February 2, 2018 - 12:56:30 PM

File

LabbeSaintJean2017.pdf
Files produced by the author(s)

Identifiers

Collections

CNRS | LARA | PACTE | UGA

Citation

Dominique Labbé. Une expérience d'attribution d'auteur. Le corpus Saint-Jean.. [Rapport de recherche] PACTE - Université Grenoble Alpes. 2017. ⟨halshs-01627373⟩

Share

Metrics

Record views

75

Files downloads

269