Automatic disambiguation of morphosyntax in spoken language corpora - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Article Dans Une Revue Behavior Research Methods Instruments and Computers Année : 2000

Automatic disambiguation of morphosyntax in spoken language corpora

Christophe Parisse
Marie-Thérèse Le Normand

Résumé

The use of computer tools has led to major advances in the study of spoken language corpora. One area that has shown particular progress is the study of child language development. Although it is now easy to lexically tag every word in a spoken language corpus, one still has to choose between numerous ambiguous forms, especially with languages such as French or English, where more than 70% of words are ambiguous. Computational linguistics can now provide a fully automatic disambiguation of lexical tags. The tool presented here (POST) can tag and disambiguate a large text in a few seconds. This tool complements systems dealing with language transcription, and also suggests further theoretical developments in the assessment of the status of morphosyntax in spoken language corpora. The program currently works for French and English, but can be easily adapted for use with other languages. The analysis and computation of a corpus produced by normal French children aged two to four, as well as of a sample corpus produced by French SLI children, are given as examples.

Domaines

Linguistique
Fichier principal
Vignette du fichier
brimc-2000.pdf (68.85 Ko) Télécharger le fichier
Loading...

Dates et versions

halshs-00102702 , version 1 (02-10-2006)

Identifiants

  • HAL Id : halshs-00102702 , version 1

Citer

Christophe Parisse, Marie-Thérèse Le Normand. Automatic disambiguation of morphosyntax in spoken language corpora. Behavior Research Methods Instruments and Computers, 2000, 32 (3), pp.468-481. ⟨halshs-00102702⟩
397 Consultations
285 Téléchargements

Partager

Gmail Facebook X LinkedIn More