-PDF Download- Perception In Multimodal Dialogue Systems EBOOK

Perception in Multimodal Dialogue Systems

Author: Elisabeth Andre

Publisher: Springer Science & Business Media

Published: 2008-06-11

Total Pages: 320

ISBN-13: 3540693688

This book constitutes the refereed proceedings of the 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, held in Kloster Irsee, Germany, in June 2008. The 37 revised full papers presented together with 1 invited keynote lecture were carefully selected from numerous submissions for inclusion in the book. The papers are organized in topical sections on multimodal and spoken dialogue systems, classification of dialogue acts and sound, recognition of eye gaze, head poses, mimics and speech as well as combinations of modalities, vocal emotion recognition, human-like and social dialogue systems, and evaluation methods for multimodal dialogue systems.

Perception in Multimodal Dialogue Systems

Author: Elisabeth Andre

Publisher: Springer

Published: 2009-08-29

Total Pages: 311

ISBN-13: 9783540865421

DOWNLOAD EBOOK →

Advances in Natural Multimodal Dialogue Systems

Author: Jan van Kuppevelt

Publisher: Springer Science & Business Media

Published: 2006-06-28

Total Pages: 376

ISBN-13: 1402039336

DOWNLOAD EBOOK →

The main topic of this volume is natural multimodal interaction. The book is unique in that it brings together a great many contributions regarding aspects of natural and multimodal interaction written by many of the important actors in the field. Topics addressed include talking heads, conversational agents, tutoring systems, multimodal communication, machine learning, architectures for multimodal dialogue systems, systems evaluation, and data annotation.

The Structure of Multimodal Dialogue II

Author: Martin M. Taylor

Publisher: John Benjamins Publishing

Published: 2000-03-15

Total Pages: 542

ISBN-13: 9027273871

DOWNLOAD EBOOK →

Most dialogues are multimodal. When people talk, they use not only their voices, but also facial expressions and other gestures, and perhaps even touch. When computers communicate with people, they use pictures and perhaps sounds, together with textual language, and when people communicate with computers, they are likely to use mouse “gestures” almost as much as words. How are such multimodal dialogues constructed? This is the main question addressed in this selection of papers of the second “Venaco Workshop”, sponsored by the NATO Research Study Group RSG-10 on Automatic Speech Processing, and by the European Speech Communication Association (ESCA).

An Evaluation Framework for Multimodal Interaction

Author: Ina Wechsung

Publisher: Springer Science & Business Media

Published: 2014-01-06

Total Pages: 204

ISBN-13: 3319038109

DOWNLOAD EBOOK →

This book presents (1) an exhaustive and empirically validated taxonomy of quality aspects of multimodal interaction as well as respective measurement methods, (2) a validated questionnaire specifically tailored to the evaluation of multimodal systems and covering most of the taxonomy‘s quality aspects, (3) insights on how the quality perceptions of multimodal systems relate to the quality perceptions of its individual components, (4) a set of empirically tested factors which influence modality choice, and (5) models regarding the relationship of the perceived quality of a modality and the actual usage of a modality.

Multimodal Conversation Modeling Via Neural Perception, Structure Learning, and Communication

Author: Zilong Zheng

Publisher:

Published: 2021

Total Pages: 144

ISBN-13:

DOWNLOAD EBOOK →

Multimodal conversation modeling is an important and challenging problem when building conversational agents. Pioneer works mostly focus on end-to-end multimodal fusion techniques, which require large volumes of pairwise data and lacks interpretability.This dissertation aims at closing the loop of vision and language multimodal modeling from the perspectives of neural perception, structure learning, and communication. Specifically, it makes four major contributions: 1. We explicitly model the joint distribution of vision and language as a Gibbs distribution. Then, we propose an "analysis by synthesis" cooperative training schema that uses the learned joint distribution to sample from one modality to another, e.g. category to image, attribute to image, etc. Further, we argue that such a training paradigm can be explained in the cognitive theory, where the conditional generator is a fast-thinking initializer that provides a rough output and the sampling process is a slow-thinking solver that refines the output with detailed multimodal information. 2. We propose to view the multimodal dialogue as a graph, where each node is a round of dialogue and the edges represent the semantic dependencies among dialogue turns. Moreover, we propose an Expectation-Maximization (EM)-based algorithm that can both predict partially observed nodes and infer graph structures. We show that such an unsupervised structure learning paradigm can provide post-hoc interpretability to various multimodal dialogue tasks. 3. We present a crucial but barely discussed challenge -- implicature and pragmatics -- in the field of conversational reasoning. We show that human communicate based on their intents and beliefs, where implicatures commonly come along. Considering the missing gap in the current natural language community, we propose a dataset generation protocol based on Spatial-Temporal And-Or-Graphs (ST-AOGs). We show that most of the state-of-the-art language models result in a large performance gap compared with humans. 4. We present a human-robot collaboration task -- bomb defusing game, that requires explanation to help human understand machine's behavior. We argue that such explanations should be generated according to the user's mental preferences, i.e. utilities. Therefore, we propose an explanation generation algorithm based on Hidden Markov Model (HMM), which considers the user's mental utilities as a hidden variable that changes based on observations. We show that, compared with rule-based conversational system, our generated explanations are more natural and are helpful in gaining human trust.

Multimodality in Language and Speech Systems

Author: Björn Granström

Publisher: Springer Science & Business Media

Published: 2013-04-17

Total Pages: 264

ISBN-13: 9401723672

DOWNLOAD EBOOK →

This book is based on contributions to the Seventh European Summer School on Language and Speech Communication that was held at KTH in Stockholm, Sweden, in July of 1999 under the auspices of the European Language and Speech Network (ELSNET). The topic of the summer school was "Multimodality in Language and Speech Systems" (MiLaSS). The issue of multimodality in interpersonal, face-to-face communication has been an important research topic for a number of years. With the increasing sophistication of computer-based interactive systems using language and speech, the topic of multimodal interaction has received renewed interest both in terms of human-human interaction and human-machine interaction. Nine lecturers contri buted to the summer school with courses on specialized topics ranging from the technology and science of creating talking faces to human-human communication, which is mediated by computer for the handicapped. Eight of the nine lecturers are represented in this book. The summer school attracted more than 60 participants from Europe, Asia and North America representing not only graduate students but also senior researchers from both academia and industry.

Spoken, Multilingual and Multimodal Dialogue Systems

Author: Ramon Lopez Cozar Delgado

Publisher: John Wiley & Sons

Published: 2007-01-11

Total Pages: 272

ISBN-13: 047002156X

DOWNLOAD EBOOK →

Dialogue systems are a very appealing technology with an extraordinary future. Spoken, Multilingual and Multimodal Dialogues Systems: Development and Assessment addresses the great demand for information about the development of advanced dialogue systems combining speech with other modalities under a multilingual framework. It aims to give a systematic overview of dialogue systems and recent advances in the practical application of spoken dialogue systems. Spoken Dialogue Systems are computer-based systems developed to provide information and carry out simple tasks using speech as the interaction mode. Examples include travel information and reservation, weather forecast information, directory information and product order. Multimodal Dialogue Systems aim to overcome the limitations of spoken dialogue systems which use speech as the only communication means, while Multilingual Systems allow interaction with users that speak different languages. Presents a clear snapshot of the structure of a standard dialogue system, by addressing its key components in the context of multilingual and multimodal interaction and the assessment of spoken, multilingual and multimodal systems In addition to the fundamentals of the technologies employed, the development and evaluation of these systems are described Highlights recent advances in the practical application of spoken dialogue systems This comprehensive overview is a must for graduate students and academics in the fields of speech recognition, speech synthesis, speech processing, language, and human–computer interaction technolgy. It will also prove to be a valuable resource to system developers working in these areas.

The Structure of Multimodal Dialogue II

Author: M. M. Taylor

Publisher: John Benjamins Publishing

Published: 2000

Total Pages: 541

ISBN-13: 9027221901

DOWNLOAD EBOOK →

Most dialogues are multimodal. When people talk, they use not only their voices, but also facial expressions and other gestures, and perhaps even touch. When computers communicate with people, they use pictures and perhaps sounds, together with textual language, and when people communicate with computers, they are likely to use mouse gestures almost as much as words. How are such multimodal dialogues constructed? This is the main question addressed in this selection of papers of the second Venaco Workshop, sponsored by the NATO Research Study Group RSG-10 on Automatic Speech Processing, and by the European Speech Communication Association (ESCA).