A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept
Autor/a
Freixes Guerreiro, Marc
Alías Pujol, Francesc
Socoró Carrié, Joan Claudi
Altres autors/es
Universitat Ramon Llull. La Salle
Data de publicació
2019-12Resum
Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the
generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which
may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be
recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original
speaker is unavailable or unable to sing properly. This work introduces a unit selection-based
text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to
enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral
speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a
proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that
challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer
than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, time-scale factors are not
reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US
configurations obtain similar naturalness rates of around 40 for all the analysed scenarios. Although these naturalness
scores are far from those of vocaloid, the singing scores of around 60 which were obtained validate that the
framework could reasonably address eventual singing needs.
Tipus de document
Article
Versió publicada
Llengua
English
Matèries (CDU)
78 - Música
Paraules clau
Parla
Pàgines
14 p.
Publicat per
SpringerLink
Publicat a
EURASIP Journal on Audio, Speech, and Music Processing. 2019:22
Número de l'acord de la subvenció
info:eu-repo/grantAgreement/MINECO i FEDER/PN I+D Excelencia/TEC2016-81107-P
info:eu-repo/grantAgreement/SUR del DEC i FSE/FI/2016FI_B2 00094
info:eu-repo/grantAgreement/URL i La Caixa/Intensificació recerca PDI/2018-URL-IR1rQ-021
info:eu-repo/grantAgreement/URL i La Caixa/Intensificació recerca PDI/2018-URL-IR2nQ-029
Aquest element apareix en la col·lecció o col·leccions següent(s)
Drets
© L'autor/a
Excepte que s'indiqui una altra cosa, la llicència de l'ítem es descriu com http://creativecommons.org/licenses/by/4.0/