Skip to Main content Skip to Navigation
Conference papers

The TXM Portal Software giving access to Old French Manuscripts Online

Abstract :
This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations).
Document type :
Conference papers
Complete list of metadatas
Contributor : Alexei Lavrentiev <>
Submitted on : Friday, November 30, 2012 - 2:57:54 PM
Last modification on : Tuesday, May 12, 2020 - 3:56:12 PM


  • HAL Id : halshs-00759361, version 1



Alexei Lavrentiev, Serge Heiden. The TXM Portal Software giving access to Old French Manuscripts Online. 7th International Conference on Language Resources and Evaluation (LREC), May 2012, Istanbul, Turkey. pp.29-35. ⟨halshs-00759361⟩



Record views