Weakly-supervised Symptom Recognition for Rare Diseases in Biomedical Text
Pierre Holat
(1, 2)
,
Nadi Tomeh
(1, 2)
,
Thierry Charnois
(1, 2)
,
Delphine Battistelli
(3, 4)
,
Marie-Christine Jaulent
(5)
,
Jean-Philippe Metivier
(6, 7)
1
LIPN -
Laboratoire d'Informatique de Paris-Nord
2 UP13 - Université Paris 13
3 MoDyCo - Modèles, Dynamiques, Corpus
4 UPN - Université Paris Nanterre
5 INSERM - Institut National de la Santé et de la Recherche Médicale
6 Equipe CODAG - Laboratoire GREYC - UMR6072
7 UNICAEN - Université de Caen Normandie
2 UP13 - Université Paris 13
3 MoDyCo - Modèles, Dynamiques, Corpus
4 UPN - Université Paris Nanterre
5 INSERM - Institut National de la Santé et de la Recherche Médicale
6 Equipe CODAG - Laboratoire GREYC - UMR6072
7 UNICAEN - Université de Caen Normandie
Thierry Charnois
- Fonction : Auteur
- PersonId : 741393
- IdHAL : charnois
- ORCID : 0000-0001-9700-5075
- IdRef : 168705117
Delphine Battistelli
- Fonction : Auteur
- PersonId : 89
- IdHAL : delphine-battistelli
- IdRef : 060895217
Marie-Christine Jaulent
- Fonction : Auteur
- PersonId : 1160040
- IdHAL : marie-christine-jaulent
- ORCID : 0000-0003-4445-7494
- IdRef : 031210619
Jean-Philippe Metivier
- Fonction : Auteur
- PersonId : 941790
Résumé
In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore , existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.
Domaines
LinguistiqueFormat du dépôt | Fichier |
---|---|
Type de dépôt | Communication dans un congrès |
Titre |
en
Weakly-supervised Symptom Recognition for Rare Diseases in Biomedical Text
|
Résumé |
en
In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore , existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.
|
Auteur(s) |
Pierre Holat
1, 2
, Nadi Tomeh
1, 2
, Thierry Charnois
1, 2
, Delphine Battistelli
3, 4
, Marie-Christine Jaulent
5
, Jean-Philippe Metivier
6, 7
1
LIPN -
Laboratoire d'Informatique de Paris-Nord
( 994 )
- Institut Galilée, Université Paris 13, 99 avenue Jean-Baptiste Clément, F-93430, Villetaneuse
- France
2
UP13 -
Université Paris 13
( 15786 )
- France
3
MoDyCo -
Modèles, Dynamiques, Corpus
( 1057 )
- Université Paris Nanterre Bâtiment A - Bureau 402 A 200, avenue de la République 92001 Nanterre Cedex
- France
4
UPN -
Université Paris Nanterre
( 116205 )
- 200 avenue de la République - 92001 Nanterre cedex
- France
5
INSERM -
Institut National de la Santé et de la Recherche Médicale
( 303623 )
- 101, rue de Tolbiac, 75013 Paris
- France
6
Equipe CODAG - Laboratoire GREYC - UMR6072
( 388594 )
- France
7
UNICAEN -
Université de Caen Normandie
( 7127 )
- Esplanade de la Paix - CS 14032 - 14032 CAEN Cedex 5
- France
|
Source |
15th International Symposium on Intelligent Data Analysis
|
Langue du document |
Anglais
|
Vulgarisation |
Non
|
Comité de lecture |
Oui
|
Invité |
Non
|
Audience |
Internationale
|
Actes |
Oui
|
Date de publication |
2016
|
Titre du congrès |
15th International Symposium on Intelligent Data Analysis
|
Date début congrès |
2016-10-13
|
Date fin congrès |
2016-10-15
|
Ville |
Stockholm
|
Pays |
Suède
|
Domaine(s) |
|
Mots-clés |
en
Information extraction, Pattern mining, CRF, Symptoms recognition, Biomedical texts
|
Origine :
Fichiers produits par l'(les) auteur(s)
Loading...