Can word vectors help corpus linguists?

Abstract : Two recent methods based on distributional semantic models (DSMs) have proved very successful in learning high-quality vector representations of words from large corpora: word2vec (Mikolov, Chen, et al. 2013; Mikolov, Yih, et al. 2013) and GloVe (Pennington et al. 2014). Once trained on a very large corpus, these algorithms produce distributed representations for words in the form of vectors. DSMs based on deep learning and neural networks have proved efficient in representing the meaning of individual words. In this paper, I assess to what extent state-of-the-art word-vector semantics can help corpus linguists annotate large datasets for semantic classes. Although word vectors suggest decisive opportunities for resolving semantic annotation issues, it has yet to improve in terms of its representation of polysemy, homonymy, and multiword expressions.
Type de document :
Pré-publication, Document de travail
2017
Liste complète des métadonnées

Littérature citée [41 références]  Voir  Masquer  Télécharger

https://halshs.archives-ouvertes.fr/halshs-01657591
Contributeur : Guillaume Desagulier <>
Soumis le : mercredi 6 décembre 2017 - 21:56:59
Dernière modification le : mardi 23 janvier 2018 - 12:54:05

Fichier

wordvecs.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : halshs-01657591, version 1

Collections

Citation

Guillaume Desagulier. Can word vectors help corpus linguists?. 2017. 〈halshs-01657591〉

Partager

Métriques

Consultations de la notice

250

Téléchargements de fichiers

185