Encoding Allographs: (ab?)Using the Element

Abstract : This paper discusses the optimal TEI-format to store information of both textual and graphical nature about letterforms, from theory to implementation. The Oriflamms research project (http://oriflamms.hypotheses.org) aims at establishing an ontology of letterforms in medieval Latin and vernacular writing systems. It uses both previously created and new manuscript TEI-encoded transcriptions, which however use slightly different tagsets. These transcriptions are converted to a common fully TEI conformant format, tokenized at word and character level and automatically aligned to zones in manuscript images. The research needs defining an optimal TEI-format and to reassess the nature of graphical differences in manuscripts and in transcriptions, such as dots, accents, apostrophes (grammar or phonetics [ou/où]), capitals (names, sentence, abbreviation), letter variants (“allographs”, later specialized according to phonetics, [i/j], [u/v], [ss/ß], or not [s/ſ]). Some input transcriptions apply traditional character normalization rules, other not. Several solutions may be used at word or letter level (// allowed in , not in ): normalised/imitative/neutral transcriptions with attributes/pointers/elements to give graphical/semantic/phonetic information. Yet, the ‘perfect’ format needs ‘perfect’ information, not provided by extant transcriptions. This paper presents more use cases and an in-depth discussion of encoding strategy and methodological underpinnings. Need for consistency, progressive enhancement, and backward compatibility raises the issue of inline/stand-off markup and of defining a usable TEI-compliant format. It will evidence the possible role of the elements (in association with ). Its use can be extended from representing “a glyph, or a non-standard character” (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-g.html) to distinguish the allograph of the source from the “regular” character in the edition (Dieu), even if both have a corresponding Unicode code point, so as to adapt the view according to the edition scope.
Type de document :
Communication dans un congrès
TEI Conference and Members' Meeting, Oct 2015, Lyon, France. pp.77-78, 2015, Book of Abstracts. <http://tei2015.huma-num.fr/fr/papers/>
Liste complète des métadonnées

Contributeur : Alexei Lavrentiev <>
Soumis le : vendredi 27 mai 2016 - 09:45:52
Dernière modification le : mercredi 4 janvier 2017 - 10:01:06


  • HAL Id : halshs-01318710, version 1



Alexei Lavrentiev, Dominique Stutzmann. Encoding Allographs: (ab?)Using the Element. TEI Conference and Members' Meeting, Oct 2015, Lyon, France. pp.77-78, 2015, Book of Abstracts. <http://tei2015.huma-num.fr/fr/papers/>. <halshs-01318710>