GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary - HAL Accéder directement au contenu
Communication dans un congrès Année : 2015

GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary

Résumé

This article introduces GLAWI, a large XML-encoded machine-readable dictionary automatically extracted from Wiktionnaire, the French edition of Wiktionary. GLAWI contains 1,341,410 articles and is released under a free license. Besides the size of its headword list, GLAWI inherits from Wiktionnaire its original macrostructure and the richness of its lexicographic descriptions: articles contain etymologies, definitions, usage examples, inflectional paradigms, lexical relations and phonemic transcriptions. The paper first gives some insights on the nature and content of Wiktionnaire, with a particular focus on its encoding format, before presenting our approach, the standardization of its microstructure and the conversion into XML. First intended to meet NLP needs, GLAWI has been used to create a number of customized lexicons dedicated to specific uses including linguistic description and psycholinguistics. The main one is GLÀFF, a large inflectional and phonological lexicon of French. We show that many more specific on demand lexicons can be easily derived from the large body of lexical knowledge encoded in GLAWI.
Fichier principal
Vignette du fichier
Sajous_Hathout_ELEX2015_GLAWI.pdf ( 1.05 Mo ) Télécharger
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

halshs-01191012, version 1 (01-09-2015)

Identifiants

  • HAL Id : halshs-01191012 , version 1

Citer

Franck Sajous, Nabil Hathout. GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary. eLex, Aug 2015, Herstmonceux, United Kingdom. ⟨halshs-01191012⟩
306 Consultations
403 Téléchargements
Dernière date de mise à jour le 20/04/2024
comment ces indicateurs sont-ils produits

Partager

Gmail Facebook Twitter LinkedIn Plus