Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems

Frédéric Landragin

Article dans une revue Signal Processing Année : 2006

Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems

(1)

Frédéric Landragin

Fonction : Auteur
PersonId : 5570
IdHAL : frederic-landragin
IdRef : 071347321

Thales Research and Technology [Palaiseau]

Résumé

The way we see the objects around us determines speech and gestures we use to refer to them. The gestures we produce structure our visual perception. The words we use have an influence on the way we see. In this manner, visual perception, language and gesture present multiple interactions between each other. The problem is global and has to be tackled as a whole in order to understand the complexity of reference phenomena and to deduce a formal model. This model may be useful for any kind of human-machine dialogue system that focuses on deep comprehension. We show how a referring act takes place into a contextual subset of objects. This subset is called `reference domain' and is implicit. It can be deduced from a lot of clues. Among these clues are those which come from the visual context and those which come from the multimodal utterance. We present the `multimodal reference domain' model that takes these clues into account and that can be exploited in a multimodal dialogue system when interpreting.

Mots clés

multimodal communication visual perception pointing gesture natural language processing reference to objects salience interpretation modeling

Domaines

Linguistique Informatique Interface homme-machine [cs.HC] Multimédia [cs.MM]

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Article dans une revue
Titre	en Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems
Résumé	en The way we see the objects around us determines speech and gestures we use to refer to them. The gestures we produce structure our visual perception. The words we use have an influence on the way we see. In this manner, visual perception, language and gesture present multiple interactions between each other. The problem is global and has to be tackled as a whole in order to understand the complexity of reference phenomena and to deduce a formal model. This model may be useful for any kind of human-machine dialogue system that focuses on deep comprehension. We show how a referring act takes place into a contextual subset of objects. This subset is called `reference domain' and is implicit. It can be deduced from a lot of clues. Among these clues are those which come from the visual context and those which come from the multimodal utterance. We present the `multimodal reference domain' model that takes these clues into account and that can be exploited in a multimodal dialogue system when interpreting.
Auteur(s)	Frédéric Landragin ¹ 1 Thales Research and Technology [Palaiseau] ( 19045 ) - 1 Avenue Augustin Fresnel, 91767 Palaiseau cedex - France THALES [France] ( 250969 )
Comité de lecture	Oui
Vulgarisation	Non
Langue du document	Anglais
Nom de la revue	Signal process. - Signal Processing (ISSN : 0165-1684, ISSN électronique : 1872-7557) Elsevier Publié par Elsevier https://www.journals.elsevier.com/signal-processing/
Date de production/écriture	2006
Audience	Internationale
Date de publication	2006
Volume	86
Numéro	12
Page/Identifiant	3578-3595
Domaine(s)	Sciences de l'Homme et Société/Linguistique Sciences cognitives/Informatique Informatique [cs]/Interface homme-machine [cs.HC] Informatique [cs]/Multimédia [cs.MM]
Mots-clés	en multimodal communication, visual perception, pointing gesture, natural language processing, reference to objects, salience, interpretation modeling

Fichier principal

06_SP.pdf ( 223.38 Ko )

Origine : Fichiers produits par l'(les) auteur(s)

Frédéric Landragin : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-00137947

Soumis le : jeudi 22 mars 2007 à 16:31:32

Dernière modification le : lundi 4 avril 2022 à 10:40:39

Archivage à long terme le : mercredi 7 avril 2010 à 01:42:53

Dates et versions

halshs-00137947, version 1 (22-03-2007)

Identifiants

HAL Id : halshs-00137947 , version 1

Citer

Frédéric Landragin. Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems. Signal Processing, 2006, 86 (12), pp.3578-3595. ⟨halshs-00137947⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

198 Consultations

348 Téléchargements

Dernière date de mise à jour le 20/04/2024

Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager