Reengineering Akkadian Tablets with TEI and TXM for Linguistic Analysis

Abstract : This paper presents a project involving TEI encoding of Akkadian tablets for their further analysis with TXM software. The goal of the project is to analyze the vocabulary, spelling and syllabary of a corpus of Akkadian letters, to outline the different Mesopotamian scribal traditions and to understand the complexity of a letter’s writing. The corpus is currently composed of 350 letters written in the Old Babylonian dialect between 2002 BC and 1595 BC. All the letters have been transliterated in Latin characters following the standards established by the Archibab team ( The transcriptions (previously stored in a relational database) were encoded in TEI for this project. Every word is tagged with a element and annotated with @ana. The element surrounds every transliterated sign, using @ref for mapping to its Rykle Borger’s syllabary identification number and Unicode codepoint. The transcription also encodes damage and conjecture elements <del>, <supplied>, <unclear>, <corr>, <surplus>, etc. Special XSLT stylesheets were designed to preprocess the TEI source transcriptions for TXM import via a generic XML import module with tokenization at word or cuneiform character levels optimized for different kinds of queries. It is for example possible to compare different letters by their vocabulary or orthography according to various metadata parameters, to study the different (transliterated) values of the cuneiform signs that are not damaged on the clay tablet or to obtain a kwic concordance of the cuneiform signs that were erased by the scribe during the writing of the letter. Correspondence analysis allows identifying the vocabulary which is characteristic to a place of composition, a circumstance or a period, and visualizing the similarity or dissimilarity of the letters. A sample corpus will be made available under open license at the TXM demo portal ( by the time of the TEI conference.
