Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings

Abstract : Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement).
Liste complète des métadonnées

https://halshs.archives-ouvertes.fr/halshs-01968357
Contributor : Natalia Grabar <>
Submitted on : Wednesday, January 2, 2019 - 3:47:38 PM
Last modification on : Saturday, March 16, 2019 - 1:55:45 AM
Document(s) archivé(s) le : Wednesday, April 3, 2019 - 4:09:31 PM

File

pylieva-IDDM2018.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-01968357, version 1

Citation

Hanna Pylieva, Artem Chernodub, Natalia Grabar, Thierry Hamon. Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings. 1st International Workshop on Informatics & Data-Driven Medicine (IDDM 2018), Nov 2018, Lviv, Ukraine. ⟨halshs-01968357⟩

Share

Metrics

Record views

37

Files downloads

57