Objective Viseme Extraction and Audiovisual Uncertainty: Estimation Limits between Auditory and Visual Modes

Melenchón Maldonado, Javier; Simó, Jordi; Cobo Rodríguez, Germán; Martínez Marroquín, Elisa; Melenchón Maldonado, Javier; Simó, Jordi; Cobo Rodríguez, Germán; Martínez Marroquín, Elisa

Publication date

2007-08

URI http://hdl.handle.net/20.500.14342/2972

Abstract

An objective way to obtain consonant visemes for any given Spanish speaking person is proposed. Its face is recorded while speaking a balanced set of sentences and stored as an audiovisual sequence. Visual and auditory modes are segmented by allophones and a distance matrix is built to find visually similar perceived allophones. Results show high correlation with tedious subjective earlier evaluations regardless of being in English. In addition, estimation between modes is also studied, revealing a tradeoff between performances in both modes: given a set of auditory groups and another of visual ones for each grouping criteria, increasing the estimation performance of one mode is translated to decreasing that of the other one. Moreover, the tradeoff is very similar (<7% between maximum and minimum values) in all observed examples

Document Type

Object of conference

Language

English

Subject (CDU)

531/534 - Mechanics

Keywords

Comunicació audiovisual

Percepció auditiva

Pages

4 p.

Publisher

International Conference on Auditory-Visual Speech Processing, Hilvarenbeek, August 31 to September 3 2007

Recommended citation

This citation was generated automatically.

Show full item record

This item appears in the following Collection(s)

Contribucions a congressos [244]