Wake up, standOff!

Abstract : The paper provides an overview of and an update on the on-going proposal to create a component within the TEI architecture. It elicits the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It demonstrates the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognisers. We relate to on-going initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein (2012). In this context we tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model. We end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content. Further materials •Minutes of the January 2014 meeting: http://download2.polytechnic.edu.na/pub7/sourceforge/l/li/lingsig/Documents/Standoff%20in%20Berlin,%2001.2014/standoff-minutesBerlin2014.pdf •The TEI GitHub ticket: https://github.com/TEIC/TEI/issues/374 •The standOff proposal on GitHub: https://github.com/laurentromary/stdfSpec (branch AnnArbor) References Bański Piotr (2010). Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies ISO/DIS 24624 Language resource management -- Transcription of spoken language Pose Javier, Patrice Lopez and Laurent Romary (2014). A Generic Formalism for Encoding Stand-off annotations in TEI. 2014. Romary Laurent (2015). TEI challenges in an accelerating digital world. DiXiT Convention week, Sep 2015, The Hague, Netherlands. 2015, . Romary Laurent and Werner Wegstein (2012), « Consistent Modeling of Heterogeneous Lexical Structures », Journal of the Text Encoding Initiative [Online], Issue 3 | November 2012, Online since 15 October 2012, connection on 12 May 2016. URL : http://jtei.revues.org/540 ; DOI : 10.4000/jtei.540 (section about Crystals : https://jtei.revues.org/540#tocfrom2n1) Web Annotation Data Model, W3C, https://www.w3.org/TR/annotation-model/
Type de document :
Document associé à des manifestations scientifiques
TEI Conference 2016, Sep 2016, Vienna, Austria. <http://tei2016.acdh.oeaw.ac.at>


https://hal.inria.fr/hal-01374102
Contributeur : Laurent Romary <>
Soumis le : jeudi 29 septembre 2016 - 17:05:38
Dernière modification le : mardi 13 décembre 2016 - 15:43:03
Document(s) archivé(s) le : vendredi 30 décembre 2016 - 14:43:29

Fichiers

WakeUpStandOff.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01374102, version 1

Collections

Citation

Piotr Banski, Bertrand Gaiffe, Patrice Lopez, Simon Meoni, Laurent Romary, et al.. Wake up, standOff!. TEI Conference 2016, Sep 2016, Vienna, Austria. <http://tei2016.acdh.oeaw.ac.at>. <hal-01374102>

Partager

Métriques

Consultations de
la notice

217

Téléchargements du document

41