CLEAR-Simple Corpus for Medical French

Abstract : Availability of corpora with technical and simplified contents is crucial for the development and test of methods for text simplification. We describe this kind of corpus for the French medical language. The corpus contains texts from three sources: encyclopedia, drug leaflets and scientific summaries. Each source proposes comparable information in specialized and plain languages. A subset of this corpus has been processed manually in order to find and align parallel sentences. This subset currently contains 663 pairs with parallel sentences. Alignment has been done by two annota-tors and shows 0.76 inter-annotator agreement .
Type de document :
Communication dans un congrès
ATA, Nov 2018, Tilburg, Netherlands
Liste complète des métadonnées

https://halshs.archives-ouvertes.fr/halshs-01968355
Contributeur : Natalia Grabar <>
Soumis le : mercredi 2 janvier 2019 - 15:43:03
Dernière modification le : jeudi 7 février 2019 - 15:36:47

Fichier

grabar-ATA2018c.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : halshs-01968355, version 1

Collections

STL

Citation

Natalia Grabar, Rémi Cardon. CLEAR-Simple Corpus for Medical French. ATA, Nov 2018, Tilburg, Netherlands. 〈halshs-01968355〉

Partager

Métriques

Consultations de la notice

9

Téléchargements de fichiers

12