Développement de ressources pour le persan: PerLex2, nouveau lexique morphologique et MElt_fa, étiqueteur morphosyntaxique

Abstract : We present a new version of PerLex, the morphological lexicon for the Persian language, a cor- rected and partially re-annotated version of the BijanKhan corpus (BijanKhan, 2004) and MEltfa, a new freely available POS-tagger for the Persian language. After PerLex's first version (Sagot & Walther, 2010), we propose an improved version of our morphological lexicon. Apart from a partial manual validation, PerLex 2 now relies on a set of linguistically motivated POS. Based on these POS, we also developped a new version of the BijanKhan corpus with significant corrections of the tokenisation. It has been re-tagged according to the new set of POS. The new version of the BijanKhan corpus has been used to develop MEltfa, our new freely-available POS-tagger for the Persian language, based on the new POS set, PerLex 2 and the MElt tagging system (Denis & Sagot, 2009).
Document type :
Conference papers
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-00751630
Contributor : Géraldine Walther <>
Submitted on : Wednesday, November 14, 2012 - 3:57:52 PM
Last modification on : Friday, March 15, 2019 - 9:44:01 AM
Long-term archiving on : Saturday, December 17, 2016 - 10:26:39 AM

File

taln11pergramshort.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-00751630, version 1

Collections

Citation

Benoît Sagot, Géraldine Walther, Pegah Faghiri, Pollet Samvelian. Développement de ressources pour le persan: PerLex2, nouveau lexique morphologique et MElt_fa, étiqueteur morphosyntaxique. TALN 2011, 2011, Montpellier, France. ⟨halshs-00751630⟩

Share

Metrics

Record views

375

Files downloads

185