ETIQUETAGE d'un CORPUS HETEROGENE de FRANÇAIS MEDIEVAL: ENJEUX et MODALITES

Abstract : We have undertaken a morpho-syntactic tagging of the 2.5 millions words of our corpora of medieval texts. The external and internal heterogeneity of the texts make this task a difficult one. As a result, we had to resort to a double strategy. Since there is actually no tool adapted to our corpora, we had first to rely on a programmable tagger in order to categorize a first text. As a second step, and building on the results obtained with the first text, we produced a tagger based on contextal rule learning. Using this latter tool we subsequently tagged a second, quite "similar" (in terms of external criteria) text. This two-step process was then used once again to tag additional texts.
The next phase will be to evaluate the heterogeneity of texts according to internal criteria. The correlation of internal and external heterogeneity will enable us to elaborate a "fine-grained" typology of texts.
Document type :
Book sections
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-00087995
Contributor : Sophie Prevost <>
Submitted on : Thursday, July 27, 2006 - 7:31:35 PM
Last modification on : Thursday, June 6, 2019 - 2:40:40 PM
Long-term archiving on : Monday, April 5, 2010 - 10:28:23 PM

Identifiers

  • HAL Id : halshs-00087995, version 1

Citation

Serge Heiden, Sophie Prévost. ETIQUETAGE d'un CORPUS HETEROGENE de FRANÇAIS MEDIEVAL: ENJEUX et MODALITES. C.D. Pusch et W. Raible. Romance Corpus Linguistics - Corpora and Spoken Language, Tübingen, Gunter Narr Verlag Tübingen, p. 127-136, 2002. ⟨halshs-00087995⟩

Share

Metrics

Record views

351

Files downloads

960