Encoding Allographs: (ab?)Using the <g> Element - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Encoding Allographs: (ab?)Using the <g> Element

Résumé

This paper discusses the optimal TEI-format to store information of both textual and graphical nature about letterforms, from theory to implementation. The Oriflamms research project (http://oriflamms.hypotheses.org) aims at establishing an ontology of letterforms in medieval Latin and vernacular writing systems. It uses both previously created and new manuscript TEI-encoded transcriptions, which however use slightly different tagsets. These transcriptions are converted to a common fully TEI conformant format, tokenized at word and character level and automatically aligned to zones in manuscript images. The research needs defining an optimal TEI-format and to reassess the nature of graphical differences in manuscripts and in transcriptions, such as dots, accents, apostrophes (grammar or phonetics [ou/où]), capitals (names, sentence, abbreviation), letter variants (“allographs”, later specialized according to phonetics, [i/j], [u/v], [ss/ß], or not [s/ſ]). Some input transcriptions apply traditional character normalization rules, other not. Several solutions may be used at word or letter level (// allowed in , not in ): normalised/imitative/neutral transcriptions with attributes/pointers/elements to give graphical/semantic/phonetic information. Yet, the ‘perfect’ format needs ‘perfect’ information, not provided by extant transcriptions. This paper presents more use cases and an in-depth discussion of encoding strategy and methodological underpinnings. Need for consistency, progressive enhancement, and backward compatibility raises the issue of inline/stand-off markup and of defining a usable TEI-compliant format. It will evidence the possible role of the elements (in association with ). Its use can be extended from representing “a glyph, or a non-standard character” (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-g.html) to distinguish the allograph of the source from the “regular” character in the edition (Dieu), even if both have a corresponding Unicode code point, so as to adapt the view according to the edition scope.

Domaines

Linguistique
Lavrentiev&Stutzmann_TEI_2015-10-30.pdf (1.74 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

halshs-01318710 , version 1 (27-05-2016)

Identifiants

  • HAL Id : halshs-01318710 , version 1

Citer

Alexei Lavrentiev, Dominique Stutzmann. Encoding Allographs: (ab?)Using the <g> Element. TEI Conference and Members' Meeting, Oct 2015, Lyon, France. pp.77-78. ⟨halshs-01318710⟩
169 Consultations
73 Téléchargements

Partager

Gmail Facebook X LinkedIn More