Tag: Speech Synthesis

AI Machine Learning & Data Science Research

Microsoft’s NaturalSpeech 2 Outperforms Previous TTS Systems in Zero-Shot Speech and Singing Synthesis

In the new paper NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers, a Microsoft team introduces NaturalSpeech 2, a TTS system with latent diffusion models for natural and strong zero-shot voice synthesis that captures expressive prosodies with superior robustness.

AI Machine Learning & Data Science Research

Speak a Foreign Language in Your Own Voice? Microsoft’s VALL-E X Enables Zero-Shot Cross-Lingual Speech Synthesis

In the new paper Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, a Microsoft research team presents VALL-E X, a simple yet effective cross-lingual neural codec language model that inherits strong in-context learning capabilities from VALL-E and demonstrates high-quality zero-shot cross-lingual speech synthesis performance.