TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation

Abstract : The increasing use of methods in natural language processing (NLP) which are based on huge corpora require that the lexical, morpho-syntactic and syntactic homogeneity of texts be mastered. We have developed a methodology and associate tools for text calibration or "profiling" within the ELRA benchmark called "Contribution to the construction of contemporary french corpora" based on multivariate analysis of linguistic features. We have integrated these tools within a modular architecture based on a generic model allowing us on the one hand flexible annotation of the corpus with the output of NLP and statistical tools and on the other hand retracing the results of these tools through the annotation layers back to the primary textual data. This allows us to justify our interpretations.
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-00087993
Contributor : Sophie Prevost <>
Submitted on : Thursday, July 27, 2006 - 7:15:42 PM
Last modification on : Tuesday, September 17, 2019 - 1:13:22 AM
Long-term archiving on : Monday, April 5, 2010 - 10:14:18 PM

Identifiers

  • HAL Id : halshs-00087993, version 1

Citation

Serge Heiden, Sophie Prévost, Benoît Habert, Helka Folch, Serge Fleury, et al.. TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation. Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation, 2000, p. 141-148. ⟨halshs-00087993⟩

Share

Metrics

Record views

752

Files downloads

852