TXM : Une plateforme logicielle open-source pour la textométrie - conception et développement

Abstract : Abstract. This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM. The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis. The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them. The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework.
Complete list of metadatas

Cited literature [9 references]  Display  Hide  Download

Contributor : Serge Heiden <>
Submitted on : Wednesday, December 22, 2010 - 3:08:57 PM
Last modification on : Tuesday, May 28, 2019 - 5:28:53 PM
Long-term archiving on : Wednesday, March 23, 2011 - 2:33:15 AM


Files produced by the author(s)


  • HAL Id : halshs-00549764, version 1



Serge Heiden. TXM : Une plateforme logicielle open-source pour la textométrie - conception et développement. 10th International Conference on the Statistical Analysis of Textual Data - JADT 2010, Jun 2010, Rome, Italie. pp.1021-1032. ⟨halshs-00549764⟩



Record views


Files downloads