Speech synthesis of Valencian using a conditional variational autoencoder with adversarial learning
Visualitza/Obre
Altres autors/es
Data de publicació
2024-09-25ISBN
9781643685434
ISSN
1879-8314
Resum
The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study
focuses on synthesizing Valencian speech to develop an effective text-to-speech
system for this linguistic variety. A meticulously recorded corpus, comprising 7
hours of speech data, was utilised to train a model based on a conditional variational
autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained
multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness
compared to the recently released Valencian model by the Aina project, indicating
its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.
Tipus de document
Article
Versió del document
Versió publicada
Llengua
Anglès
Matèries (CDU)
00 - Ciència i coneixement. Investigació. Cultura. Humanitats
004 - Informàtica
81 - Lingüística i llengües
Paraules clau
Pàgines
4 p.
Publicat per
IOS Press
Publicat a
Artificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence
Citació recomanada
Aquesta citació s'ha generat automàticament.
Aquest element apareix en la col·lecció o col·leccions següent(s)
Drets
© L'autor/a
Excepte que s'indiqui una altra cosa, la llicència de l'ítem es descriu com http://creativecommons.org/licenses/by-nc/4.0/


