Tag: Text-to-Speech

AI Machine Learning & Data Science Research

Microsoft’s NaturalSpeech 2 Outperforms Previous TTS Systems in Zero-Shot Speech and Singing Synthesis

In the new paper NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers, a Microsoft team introduces NaturalSpeech 2, a TTS system with latent diffusion models for natural and strong zero-shot voice synthesis that captures expressive prosodies with superior robustness.

AI Machine Learning & Data Science Nature Language Tech Research

Apple Neural TTS System Study: Combining Speakers of Multiple Languages to Improve Synthetic Voice Quality

An Apple research team explores multiple architectures and training procedures to develop a novel multi-speaker and multi-lingual neural TTS system. The study combines speech from 30 speakers from 15 locales in 8 languages, and demonstrates that for the vast majority of voices, such multi-lingual and multi-speaker models can yield better quality than single speaker models.