Objective Viseme Extraction and Audiovisual Uncertainty: Estimation Limits between Auditory and Visual Modes
Visualitza/Obre
Autor/a
Melenchón Maldonado, Javier
Simó, Jordi
Cobo Rodríguez, Germán
Martínez Marroquín, Elisa
Altres autors/es
Universitat Ramon Llull. La Salle
Data de publicació
2007-08Resum
An objective way to obtain consonant visemes for any given
Spanish speaking person is proposed. Its face is recorded while
speaking a balanced set of sentences and stored as an audiovisual sequence. Visual and auditory modes are segmented by
allophones and a distance matrix is built to find visually similar
perceived allophones. Results show high correlation with tedious subjective earlier evaluations regardless of being in English. In addition, estimation between modes is also studied, revealing a tradeoff between performances in both modes: given
a set of auditory groups and another of visual ones for each
grouping criteria, increasing the estimation performance of one
mode is translated to decreasing that of the other one. Moreover, the tradeoff is very similar (<7% between maximum and
minimum values) in all observed examples
Tipus de document
Objecte de conferència
Llengua
English
Matèries (CDU)
531/534 - Mecànica. Vibracions. Acústica
Paraules clau
Comunicació audiovisual
Percepció auditiva
Pàgines
4 p.
Publicat per
International Conference on Auditory-Visual Speech Processing, Hilvarenbeek, August 31 to September 3 2007
Aquest element apareix en la col·lecció o col·leccions següent(s)
Drets
© International Speech Communication Association. Tots els drets reservats