Searching for Discriminative Metadata of Heterogenous Corpora - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Searching for Discriminative Metadata of Heterogenous Corpora

Résumé

In this paper, we use machine learning techniques for part-of-speech tagging and parsing to explore the specificities of a highly heterogeneous corpus. The corpus used is a treebank of Old French made of texts which differ with respect to several types of metadata: production date, form (verse/prose), domain , and dialect. We conduct experiments in order to determine which of these metadata are the most discriminative and to induce a general methodology .
Fichier principal
Vignette du fichier
guibon_al_TLT2015.pdf (112.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01250981 , version 1 (05-01-2016)

Identifiants

  • HAL Id : hal-01250981 , version 1

Citer

Gaël Guibon, Isabelle Tellier, Sophie Prévost, Mathieu Constant, Kim Gerdes. Searching for Discriminative Metadata of Heterogenous Corpora. Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Dec 2015, Varsovie, Poland. pp.72-82. ⟨hal-01250981⟩
157 Consultations
204 Téléchargements

Partager

Gmail Facebook X LinkedIn More