The TXM Portal Software giving access to Old French Manuscripts Online

Alexei Lavrentiev; Serge Heiden

Communication dans un congrès Année : 2012

The TXM Portal Software giving access to Old French Manuscripts Online

(1) , (1)

Alexei Lavrentiev

Fonction : Auteur
PersonId : 2718
IdHAL : alavrent
ORCID : 0000-0001-8306-3653
IdRef : 117944688

Interactions, Corpus, Apprentissages, Représentations

Serge Heiden

Fonction : Auteur
PersonId : 7692
IdHAL : serge-heiden
ORCID : 0000-0003-4682-7647
IdRef : 111293383

Interactions, Corpus, Apprentissages, Représentations

Résumé

http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf
This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations).

Mots clés

Old French textometry TEI tokenizer synoptic edition

ancien français textométrie tokeniseur édition synoptique

Domaines

Linguistique

Liste complète des métadonnées

Format du dépôt	Notice
Type de dépôt	Communication dans un congrès
Titre	en The TXM Portal Software giving access to Old French Manuscripts Online
Résumé	en http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf <br> This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations).
Auteur(s)	Alexei Lavrentiev ¹ , Serge Heiden ¹ 1 ICAR - Interactions, Corpus, Apprentissages, Représentations ( 51028 ) - 5, av Pierre Mendès-France 69676 BRON CEDEX - France École normale supérieure de Lyon ( 6818 ) ; Université Lumière - Lyon 2 ( 33804 ) ; INRP ( 300042 ) ; Ecole Normale Supérieure Lettres et Sciences Humaines ( 303652 ) ; Centre National de la Recherche Scientifique UMR5191 ( 441569 )
Langue du document	Anglais
Date de production/écriture	2012
Source	Proceedings of the 1st Workshop on Adaptation of Language Resources and Tools for Processing Cultural Heritage Objects, Seventh International Conference on Language Resources and Evaluation
Vulgarisation	Non
Comité de lecture	Oui
Invité	Non
Audience	Internationale
Actes	Oui
Date de publication	2012
Page/Identifiant	29-35
Titre du congrès	7th International Conference on Language Resources and Evaluation (LREC)
Date début congrès	2012-05-21
Date fin congrès	2012-05-27
Ville	Istanbul
Pays	Turquie
Commentaire	Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf
Domaine(s)	Sciences de l'Homme et Société/Linguistique
Collaboration/Projet	Les auteurs remercient le LABEX ASLAN (ANR-10-LABX-0081) de l'Université de Lyon pour son soutien financier dans le cadre du programme "Investissements d'Avenir" (ANR-11-IDEX-0007) de l'Etat Français géré par l'Agence Nationale de la Recherche (ANR).
Mots-clés	en Old French, textometry, TEI, tokenizer, synoptic edition fr ancien français, textométrie, tokeniseur, édition synoptique

Alexey Lavrentev : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-00759361

Soumis le : vendredi 30 novembre 2012 à 14:57:54

Dernière modification le : vendredi 12 mai 2023 à 03:58:27

Dates et versions

halshs-00759361, version 1 (30-11-2012)

Identifiants

HAL Id : halshs-00759361 , version 1

Citer

Alexei Lavrentiev, Serge Heiden. The TXM Portal Software giving access to Old French Manuscripts Online. 7th International Conference on Language Resources and Evaluation (LREC), May 2012, Istanbul, Turkey. pp.29-35. ⟨halshs-00759361⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

ENS-LYON CNRS UNIV-LYON2 ICAR UDL

124 Consultations

0 Téléchargements

Dernière date de mise à jour le 20/04/2024

The TXM Portal Software giving access to Old French Manuscripts Online

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager