Speak a Foreign Language in Your Own Voice? Microsoft’s VALL-E X Enables Zero-Shot Cross-Lingual Speech Synthesis

In the new paper Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, a Microsoft research team presents VALL-E X, a simple yet effective cross-lingual neural codec language model that inherits strong in-context learning capabilities from VALL-E and demonstrates high-quality zero-shot cross-lingual speech synthesis performance.