| HAL : halshs-00549764, version 1 |
| Fiche détaillée | Export this paper |
|
|
| 24th Pacific Asia Conference on Language, Information and Computation, Sendai : Japan (2010) |
|
|
|
|
| The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme |
|
|
| Serge Heiden 1 |
|
|
| (2010-11-04) |
|
|
| This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM. The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis. The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them. The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework. |
|
|
|
|
|
|
|
|
|
|
| 1 : | Interactions, Corpus, Apprentissages, Représentations (ICAR) |
| CNRS : UMR5191 – Université Lumière - Lyon II – Ecole Normale Supérieure Lettres et Sciences Humaines – INRP – École Normale Supérieure - Lyon | |
|
|
|
|
|
|
|
|
| ICAR3 |
|
|
|
|
| Discipline | : | Humanities and Social Sciences/Methods and statistics Computer Science/Document and Text Processing Statistics/Applications Computer Science/Computation and Language Computer Science/Digital Libraries Humanities and Social Sciences/Linguistics |
|
|
| xml-tei corpora – search engine – statistical analysis – textometry – open-source |
|
|
| Liste des fichiers attachés à ce document : | |||||
|
|
|
| halshs-00549764, version 1 | |
| http://halshs.archives-ouvertes.fr/halshs-00549764 | |
| oai:halshs.archives-ouvertes.fr:halshs-00549764 | |
| Contributeur : Serge Heiden | |
| Submitted on : Wednesday, 22 December 2010 15:08:57 | |
| Updated on : Friday, 31 December 2010 16:58:18 | |