Extracting an Etymological Database from Wiktionary - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Extracting an Etymological Database from Wiktionary

Résumé

Electronic lexical resources almost never contain etymological information. The availability of such information, if properly formalised, could open up the possibility of developing automatic tools targeted towards historical and comparative linguistics, as well as significantly improving the automatic processing of ancient languages. We describe here the process we implemented for extracting etymological data from the etymological notices found in Wiktionary. We have produced a multilingual database of nearly one million lexemes and a database of more than half a million etymological relations between lexemes.
Fichier principal
Vignette du fichier
paper44.pdf (709.97 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01592061 , version 1 (22-09-2017)

Identifiants

  • HAL Id : hal-01592061 , version 1

Citer

Benoît Sagot. Extracting an Etymological Database from Wiktionary. Electronic Lexicography in the 21st century (eLex 2017), Sep 2017, Leiden, Netherlands. pp.716-728. ⟨hal-01592061⟩

Collections

INRIA INRIA2 ANR
532 Consultations
1084 Téléchargements

Partager

Gmail Facebook X LinkedIn More