The TXM Platform : Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme - HAL-SHS - Sciences de l'Homme et de la Société Access content directly
Conference Papers Year : 2010

The TXM Platform : Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme

Abstract

Abstract. This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM. The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis. The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them. The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework.
Fichier principal
Vignette du fichier
paclic24_sheiden.pdf (996.42 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

halshs-00549764 , version 1 (22-12-2010)

Identifiers

  • HAL Id : halshs-00549764 , version 1

Cite

Serge Heiden. The TXM Platform : Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. 24th Pacific Asia Conference on Language, Information and Computation, Nov 2010, Sendai, Japan. pp.389‑398. ⟨halshs-00549764⟩
2185 View
1245 Download

Share

Gmail Facebook X LinkedIn More