TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation

Serge Heiden; Sophie Prévost; Benoît Habert; Helka Folch; Serge Fleury; Gabriel Illouz; Pierre Lafon; Julien Nioche

Communication dans un congrès Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation Année : 2000

TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation

(1) , (2) , (2, 3) , (3) , (4) , (3) , (1) , (5)

1
2
3
4
5

Serge Heiden

Fonction : Auteur
PersonId : 7692
IdHAL : serge-heiden
ORCID : 0000-0003-4682-7647
IdRef : 111293383

Interactions, Corpus, Apprentissages, Représentations

Sophie Prévost

Fonction : Auteur
PersonId : 11364
IdHAL : sprevost
ORCID : 0000-0003-3623-3482
IdRef : 059781904

Langues, textes, traitement informatique, cognition

Benoît Habert

Fonction : Auteur

Langues, textes, traitement informatique, cognition

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Helka Folch

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Serge Fleury

Fonction : Auteur
PersonId : 6773
IdHAL : serge-fleury
IdRef : 203040503

SYLED - Systèmes Linguistiques, Énonciation et Discursivité - EA 2290

Gabriel Illouz

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Pierre Lafon

Fonction : Auteur

Interactions, Corpus, Apprentissages, Représentations

Julien Nioche

Fonction : Auteur

Department of Computer Sciences [Scheffield]

Résumé

The increasing use of methods in natural language processing (NLP) which are based on huge corpora require that the lexical, morpho-syntactic and syntactic homogeneity of texts be mastered. We have developed a methodology and associate tools for text calibration or "profiling" within the ELRA benchmark called "Contribution to the construction of contemporary french corpora" based on multivariate analysis of linguistic features. We have integrated these tools within a modular architecture based on a generic model allowing us on the one hand flexible annotation of the corpus with the output of NLP and statistical tools and on the other hand retracing the results of these tools through the annotation layers back to the primary textual data. This allows us to justify our interpretations.

Mots clés

TyPTex Inductive typological text classification multivariate statistical analysis NLP systems

Domaines

Linguistique Linguistique Informatique et langage [cs.CL] Informatique

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Communication dans un congrès
Titre	en TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation
Résumé	en The increasing use of methods in natural language processing (NLP) which are based on huge corpora require that the lexical, morpho-syntactic and syntactic homogeneity of texts be mastered. We have developed a methodology and associate tools for text calibration or "profiling" within the ELRA benchmark called "Contribution to the construction of contemporary french corpora" based on multivariate analysis of linguistic features. We have integrated these tools within a modular architecture based on a generic model allowing us on the one hand flexible annotation of the corpus with the output of NLP and statistical tools and on the other hand retracing the results of these tools through the annotation layers back to the primary textual data. This allows us to justify our interpretations.
Auteur(s)	Serge Heiden ¹ , Sophie Prévost ² , Benoît Habert ^{2, 3} , Helka Folch ³ , Serge Fleury ⁴ , Gabriel Illouz ³ , Pierre Lafon ¹ , Julien Nioche ⁵ 1 ICAR - Interactions, Corpus, Apprentissages, Représentations ( 51028 ) - 5, av Pierre Mendès-France 69676 BRON CEDEX - France École normale supérieure de Lyon ( 6818 ) ; Université Lumière - Lyon 2 ( 33804 ) ; INRP ( 300042 ) ; Ecole Normale Supérieure Lettres et Sciences Humaines ( 303652 ) ; Centre National de la Recherche Scientifique UMR5191 ( 441569 ) 2 LaTTice - Langues, textes, traitement informatique, cognition ( 1242 ) - 1 rue Maurice Arnoux 92120 Montrouge - France École normale supérieure - Paris ( 59704 ) ; Université Paris Sciences et Lettres ( 564132 ) ; Université Paris Diderot - Paris 7 ( 300301 ) ; Centre National de la Recherche Scientifique UMR8094 ( 441569 ) 3 LIMSI - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur ( 247329 ) - Université Paris-Sud Bât. 507 - Rue du Belvédère -91405 ORSAY CEDEX - France Université Paris-Sud - Paris 11 ( 92966 ) ; Sorbonne Université - UFR d'Ingénierie ( 408837 ) ; Sorbonne Université ( 413221 ) ; Université Paris-Saclay ( 419361 ) ; Centre National de la Recherche Scientifique UPR3251 ( 441569 ) ; Université Paris Saclay (COmUE) ( 562112 ) 4 SYLED - SYLED - Systèmes Linguistiques, Énonciation et Discursivité - EA 2290 ( 107737 ) - 13 rue de Santeuil 75231 PARIS Cedex 05 - France Université Sorbonne Nouvelle - Paris 3 ( 52995 ) 5 Department of Computer Sciences [Scheffield] ( 90620 ) - The Department of Computer Science University of Scheffield Regent Court 211 Portobello Sheffield, S1 4DP. UNITED KINGDOM. Tel: +44 (0) 114 222 1800 Fax: +44 (0) 114 222 1810 - Royaume-Uni University of Sheffield [Sheffield] ( 153591 )
Vulgarisation	Non
Comité de lecture	Oui
Actes	Oui
Invité	Non
Langue du document	Anglais
Nom de la revue	Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation
Titre de l'ouvrage	Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation
Audience	Non spécifiée
Date de publication	2000
Page/Identifiant	p. 141-148
Titre du congrès	Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation
Date début congrès	2000
Domaine(s)	Sciences de l'Homme et Société/Linguistique Sciences cognitives/Linguistique Informatique [cs]/Informatique et langage [cs.CL] Sciences cognitives/Informatique
Mots-clés	en TyPTex, Inductive typological text classification, multivariate statistical analysis, NLP systems

Fichier principal

prevost-biblio9.pdf ( 357.69 Ko )

Sophie Prévost : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-00087993

Soumis le : jeudi 27 juillet 2006 à 19:15:42

Dernière modification le : vendredi 19 avril 2024 à 16:18:55

Archivage à long terme le : lundi 5 avril 2010 à 22:14:18

Dates et versions

halshs-00087993, version 1 (27-07-2006)

Identifiants

HAL Id : halshs-00087993 , version 1

Citer

Serge Heiden, Sophie Prévost, Benoît Habert, Helka Folch, Serge Fleury, et al.. TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation. Maria Gavrilidou, George Carayannis, Stella Markantonatou, Stelios Piperidis, Gregory Stainhaouer (éds) Second International Conference on Language Resources and Evaluation, 2000, p. 141-148. ⟨halshs-00087993⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

ENS-LYON UNIV-PARIS7 ENS-PARIS CNRS UNIV-LYON2 UNIV-PARIS3 LATTICE ICAR LIMSI CAMPUS-AAR AAI PSL UNIV-PARIS-SACLAY CLESTHIA SORBONNE-UNIVERSITE UDL LISN GS-SPORT-HUMAN-MOVEMENT

585 Consultations

792 Téléchargements

Dernière date de mise à jour le 20/04/2024

TyPTex : Inductive typological text classification by multivariate statistical analysis for NLP systems tuning/evaluation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager