The TXM Portal Software giving access to Old French Manuscripts Online

Abstract : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf
This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations).
Type de document :
Communication dans un congrès
7th International Conference on Language Resources and Evaluation (LREC), May 2012, Istanbul, Turkey. pp.29-35, 2012
Liste complète des métadonnées

https://halshs.archives-ouvertes.fr/halshs-00759361
Contributeur : Alexei Lavrentiev <>
Soumis le : vendredi 30 novembre 2012 - 14:57:54
Dernière modification le : mardi 21 juin 2016 - 09:33:20

Identifiants

  • HAL Id : halshs-00759361, version 1

Collections

Citation

Alexei Lavrentiev, Serge Heiden. The TXM Portal Software giving access to Old French Manuscripts Online. 7th International Conference on Language Resources and Evaluation (LREC), May 2012, Istanbul, Turkey. pp.29-35, 2012. <halshs-00759361>

Partager

Métriques

Consultations de la notice

181