Проблемы лингвистической разметки и анализа электронных критических изданий текстов письменного наследия в стандарте XML-TEI

Abstract : In this paper we consider some problems of automatic linguistic annotation and analysis of textual heritage documents encoded according to the TEI XML guidelines. TEI XML is a popular standard for encoding electronic editions of textual heritage documents as it allows highly customizable semantically-oriented markup independent of a particular platform or software. TEI is aimed at facilitating data exchange and interoperability. However, rich editorial markup including various readings and interpretations at various levels of linguistic hierarchy may be a serious challenge if one wants to apply NLP (natural language processing) tools to such an edition. Based on the example of the Base de Français Médiéval Old French corpus and on the electronic edition of the Queste del saint Graal, we will discuss the solutions to these problems that are implemented in the TXM platform import modules.
Document type :
Conference papers
Complete list of metadatas

https://halshs.archives-ouvertes.fr/halshs-00759376
Contributor : Alexei Lavrentiev <>
Submitted on : Tuesday, December 4, 2012 - 11:00:12 AM
Last modification on : Saturday, November 3, 2018 - 4:25:05 PM
Long-term archiving on : Tuesday, March 5, 2013 - 2:50:11 AM

File

Lavrentiev_elmanuscript12.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-00759376, version 1

Collections

Citation

Alexei Lavrentiev. Проблемы лингвистической разметки и анализа электронных критических изданий текстов письменного наследия в стандарте XML-TEI. 4th International conference on information technologies and textual heritage "El'Manuscript-2012", Sep 2012, Petrozavodsk, Russia. pp.150-153. ⟨halshs-00759376⟩

Share

Metrics

Record views

364

Files downloads

401