Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings

Abstract : Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement).
Type de document :
Communication dans un congrès
1st International Workshop on Informatics & Data-Driven Medicine (IDDM 2018), Nov 2018, Lviv, Ukraine
Liste complète des métadonnées

https://halshs.archives-ouvertes.fr/halshs-01968357
Contributeur : Natalia Grabar <>
Soumis le : mercredi 2 janvier 2019 - 15:47:38
Dernière modification le : mardi 12 février 2019 - 01:30:15

Fichier

pylieva-IDDM2018.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : halshs-01968357, version 1

Citation

Hanna Pylieva, Artem Chernodub, Natalia Grabar, Thierry Hamon. Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings. 1st International Workshop on Informatics & Data-Driven Medicine (IDDM 2018), Nov 2018, Lviv, Ukraine. 〈halshs-01968357〉

Partager

Métriques

Consultations de la notice

18

Téléchargements de fichiers

21