ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription

Automatic music transcription (AMT) is the task of transcribing raw audio recordings into symbolic representations such as the Musical Instrument Digital Interface (MIDI) technical standard. The field presents a variety of research challenges in signal processing and AI, as music signals often contain multiple sound sources correlated over time and frequency. In recent years the use of neural network based approaches has increased. These can simultaneously detect music information such as note onsets and offsets and pitches, etc., and have delivered SOTA results in AMT tasks.

AMT for piano music remains notoriously tricky because of the highly polyphonic nature of the instrument. In the recent paper High-Resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times, researchers from TikTok developer ByteDance introduce a high-resolution piano transcription system trained by regressing the precise onset and offset times of piano notes and pedals. The approach outperforms Google’s onsets and frames based system to set a new SOTA for piano note transcription.

Previous piano transcription systems typically split audio recordings into audio frames using discriminative models. This enabled them to predict the presence or absence of onsets and offsets framewise, but restricted transcription resolution to the frame hop size. Moreover, any misalignment in onset or offset labels in audio recordings made it difficult to precisely detect onset or offset times.

The researchers also note that even though sustain pedals play an essential part in pianos’ musical expression, current AMT systems do not typically perform pedal transcription.

Rather than classifying the presence probabilities of onsets and offsets as previous systems have, the proposed approach uses both notes and pedal transcription systems and an analytical algorithm to predict the continuous onsets and offsets of all notes and pedal events.

On the large-scale MAESTRO dataset of paired audio recordings and high-precision MIDI files, the system achieved an onset F1 of 96.72 percent, outperforming Google’s SOTA frames system (94.8 percent). In the first sustain pedal transcription evaluation on the MAESTRO dataset, the system set the benchmark with a pedal onset F1 score of 91.86 percent.

It's interesting that Bytedance's AI lab does very good work on automatic MIDI transcription — turning raw audio into sheet music. Can't help thinking that might one day be another data source for TikTok's algorithms https://t.co/fDnRijpAv2
— James Vincent (@jjvincent) October 7, 2020

Experiment results also saw the pedal transcription system perform well on five-second audio clips. The team says it intends to extend their new approach to the transcription of other instruments. Some speculate TikTok owner ByteDance might use the research to develop new music sources and creative possibilities for its popular short video platform.

The paper High-Resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times is on arXiv, and the source code is on GitHub.

Reporter: Fangyu Cai | Editor: Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

5 comments on “ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription”

Pingback: ByteDance High-Resolution AMT System Achieves SOTA in Piano Note - GistTree
Pingback: [R] ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription – tensor.io
Pingback: [R] ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription > Seekalgo
garry hilton

2023-02-28

The sad truth is that it’s completely possible to build an uninspiring and predictable life in music, even if you spend the majority of your time writing, recording, and performing. Writing loads of music will undoubtedly boost your chances of creating great work, but it will be much harder to get there without meaningful inspiration.

Loading...

Phini

2023-02-28

It’s amazing! Thank you for a great article. If you’re interested in buying a piano, I ordered one here https://steinway.co.uk/pianos/upright-pianos/boston-uprights/. There is detailed information about the range of pianos there. The site is visually appealing and easy to navigate, and the high-quality images demonstrate the craftsmanship of each piano. Overall, the site is a great resource for those interested in buying a piano.

Loading...

ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription

Like this:

5 comments on “ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription”

Leave a Reply Cancel reply

Related

Share this:

Like this:

5 comments on “ByteDance High-Resolution AMT System Achieves SOTA in Piano Note and Pedal Transcription”

Leave a Reply Cancel reply

Related