Audio Processing

by Synced 2024-05-21 2

Generalizable Audio AI: Discover the Power of SpeechVerse by Amazon AWS AI Labs

In a new paper SpeechVerse: A Large-scale Generalizable Audio Language Model, a research team from Amazon AWS AI Labs introduces SpeechVerse, a robust multi-task framework that leverages supervised instruction fine-tuning to achieve strong performance across various speech tasks.

by Synced 2024-02-11 2

AI Machine Learning & Data Science Research

Introducing NVIDIA’s Audio Flamingo, the Next Frontier in Audio Language Models

An NVIDIA research team introduces Audio Flamingo, a groundbreaking audio language model that incorporates in-context learning (ICL), retrieval augmented generation (RAG), and multi-turn dialogue capabilities, achieving SOTA performance across various audio understanding tasks.

by Synced 2022-06-22 2

AI Machine Learning & Data Science Research

A WaveNet Rival? Stanford U Study Models Raw Audio Waveforms Over Contexts of 500k Samples

In the new paper GoodBye WaveNet — A Language Model for Raw Audio with Context of 1/2 Million Samples, Stanford University researcher Prateek Verma presents a generative auto-regressive architecture that models audio waveforms over contexts greater than 500,000 samples and outperforms state-of-the-art WaveNet baselines.

by Synced 2020-10-12 5

Machine Learning & Data Science

ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription

ByteDance introduces a high-resolution piano transcription system trained by regressing the precise onset and offset times of piano notes and pedals.