Speech synthesis of Valencian using a conditional variational autoencoder with adversarial learning

Aragó, Joan; Freixes, Marc

dc.contributor	Universitat Ramon Llull. La Salle
dc.contributor.author	Aragó, Joan
dc.contributor.author	Freixes, Marc
dc.date.accessioned	2026-03-17T19:34:48Z
dc.date.available	2026-03-17T19:34:48Z
dc.date.created	2024
dc.date.issued	2024-09-25
dc.identifier.isbn	9781643685434	ca
dc.identifier.issn	1879-8314	ca
dc.identifier.uri	https://hdl.handle.net/20.500.14342/6074
dc.description.abstract	The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.	ca
dc.format.extent	4 p.	ca
dc.language.iso	eng	ca
dc.publisher	IOS Press	ca
dc.relation.ispartof	Artificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence	ca
dc.rights	© L'autor/a	ca
dc.rights	Attribution-NonCommercial 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	*
dc.subject.other	Human-machine communication	ca
dc.subject.other	AI applications	ca
dc.subject.other	Speech synthesis	ca
dc.title	Speech synthesis of Valencian using a conditional variational autoencoder with adversarial learning	ca
dc.type	info:eu-repo/semantics/article	ca
dc.rights.accessLevel	info:eu-repo/semantics/openAccess
dc.embargo.terms	cap	ca
dc.subject.udc	00	ca
dc.subject.udc	004	ca
dc.subject.udc	81	ca
dc.identifier.doi	https://doi.org/10.3233/FAIA240427	ca
dc.description.version	info:eu-repo/semantics/publishedVersion	ca