From the corpus to the lexicon: the example of data models for verb subcategorization. - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Communication Dans Un Congrès Workshop on Syntactically Annotated Corpora. Corpus Linguistics 2005 Conference. Année : 2005

From the corpus to the lexicon: the example of data models for verb subcategorization.

Résumé

This paper describes the integration of corpus-based syntactic subcategorization frames and correlated semantic
information into a large-scale, cross-theoretically informed lexical database for French
(Romary et al. (2004)). This database is the first to implement the Lexical Markup Framework (LMF), an international
initiative towards ISO standards for lexical databases (ISO TC 37/SC 4). The subcategorization
frames have been acquired via a dependency-based parser (Bick (2003)), whose verb lexicon is currently incomplete
with respect to subcategorization frames. Therefore, we have implemented probabilistic filtering as
a post-parsing treatment using the binomial distribution. Building on our discussion of what semantic information,
e.g., participant roles, to include in the database, we describe how we plan to exploit our findings on
subcategorization frames to derive this information via unsupervised learning techniques.
Fichier principal
Vignette du fichier
final.chesley_alt_subcats_lmf.pdf (108.49 Ko) Télécharger le fichier
Loading...

Dates et versions

halshs-00004100 , version 1 (12-07-2005)

Identifiants

  • HAL Id : halshs-00004100 , version 1

Citer

Paula Chesley, Susanne Salmon-Alt. From the corpus to the lexicon: the example of data models for verb subcategorization.. Workshop on Syntactically Annotated Corpora. Corpus Linguistics 2005 Conference., Jul 2005, Birmingham., United Kingdom. ⟨halshs-00004100⟩
81 Consultations
123 Téléchargements

Partager

Gmail Facebook X LinkedIn More