Can word vectors help corpus linguists?

Guillaume Desagulier

doi:10.1080/00393274.2019.1616220

Article dans une revue Studia Neophilologica Année : 2019

Can word vectors help corpus linguists?

Les vecteurs lexicaux peuvent-ils venir en aide aux linguistes de corpus ?

(1, 2)

1
2

Guillaume Desagulier

Fonction : Auteur
PersonId : 16087
IdHAL : guillaume-desagulier
ORCID : 0000-0003-4895-0788
IdRef : 095062742

Modèles, Dynamiques, Corpus

Institut universitaire de France

Résumé

Two recent methods based on distributional semantic models (DSMs) have proved very successful in learning high-quality vector representations of words from large corpora: word2vec (Mikolov, Chen, et al. 2013; Mikolov, Yih, et al. 2013) and GloVe (Pennington et al. 2014). Once trained on a very large corpus, these algorithms produce distributed representations for words in the form of vectors. DSMs based on deep learning and neural networks have proved efficient in representing the meaning of individual words. In this paper, I assess to what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes. Although word vectors suggest decisive opportunities for resolving semantic annotation issues, it has yet to improve in terms of its representation of polysemy, homonymy, and multiword expressions.

Mots clés

corpus linguistics distributional semantic models word vectors semantic annotation

Domaines

Linguistique Méthodes et statistiques

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Article dans une revue
Résumé	en Two recent methods based on distributional semantic models (DSMs) have proved very successful in learning high-quality vector representations of words from large corpora: word2vec (Mikolov, Chen, et al. 2013; Mikolov, Yih, et al. 2013) and GloVe (Pennington et al. 2014). Once trained on a very large corpus, these algorithms produce distributed representations for words in the form of vectors. DSMs based on deep learning and neural networks have proved efficient in representing the meaning of individual words. In this paper, I assess to what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes. Although word vectors suggest decisive opportunities for resolving semantic annotation issues, it has yet to improve in terms of its representation of polysemy, homonymy, and multiword expressions.
Titre	en Can word vectors help corpus linguists? fr Les vecteurs lexicaux peuvent-ils venir en aide aux linguistes de corpus ?
Auteur(s)	Guillaume Desagulier ^{1, 2} 1 MoDyCo - Modèles, Dynamiques, Corpus ( 1057 ) - Université Paris Nanterre Bâtiment A - Bureau 402 A 200, avenue de la République 92001 Nanterre Cedex - France Université Paris Nanterre UMR7114 ( 116205 ) ; Centre National de la Recherche Scientifique UMR7114 ( 441569 ) 2 IUF - Institut universitaire de France ( 56663 ) - Maison des Universités 103 Boulevard Saint-Michel 75005 Paris - France Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche ( 301855 )
Langue du document	Anglais
Date de production/écriture	2019-04-24
Nom de la revue	Studia Neophilologica (ISSN : 0039-3274) Publié par Taylor & Francis (Routledge): SSH Titles
Vulgarisation	Non
Comité de lecture	Oui
Audience	Internationale
Date de publication	2019-07-22
Domaine(s)	Sciences de l'Homme et Société/Linguistique Sciences de l'Homme et Société/Méthodes et statistiques
Mots-clés	en corpus linguistics, distributional semantic models, word vectors, semantic annotation
DOI	10.1080/00393274.2019.1616220

Fichier principal

wordvecs.pdf ( 622.44 Ko )

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Desagulier : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-01657591

Soumis le : mercredi 3 octobre 2018 à 12:56:23

Dernière modification le : lundi 15 avril 2024 à 11:25:23

Dates et versions

halshs-01657591, version 1 (06-12-2017)

halshs-01657591, version 2 (03-10-2018)

Identifiants

HAL Id : halshs-01657591 , version 2
DOI : 10.1080/00393274.2019.1616220

Citer

Guillaume Desagulier. Can word vectors help corpus linguists?. Studia Neophilologica, 2019, ⟨10.1080/00393274.2019.1616220⟩. ⟨halshs-01657591v2⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

CNRS MODYCO UNIV-PARIS-LUMIERES UNIV-PARIS-NANTERRE

481 Consultations

1490 Téléchargements

Dernière date de mise à jour le 28/04/2024

Can word vectors help corpus linguists?

Les vecteurs lexicaux peuvent-ils venir en aide aux linguistes de corpus ?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager