Every year, thousands of new songs are created. Some are beautiful, some eventually become famous, but regardless of their beauty or popularity, they are all created by human composers. But recently, this domain of human creativity has been breached. The culprit? A strange song that’s entirely composed by an artificial intelligence named Neural Karaoke.
This was the work of a University of Toronto Ph.D. student named Hang Chu. He recently published his project, a song completely composed and vocalized by Artificial Intelligence, complete with a lyric video and a dancing stickman music video. Check out the videos at the following link!
By building a Hierarchical RNN, Hang Chu collected a large amount of music data. This AI model then analyzed the data to find the most frequently appearing common structure, picking out similar characteristics for songs of like styles. After the analysis, Hang Chu then built a multilayer neural network model with an original structure to generate a song bearing some resemblance to an input image. Every layer of this RNN corresponds to the generation of a specific musical component. At the same time, each layer also represent a double layer LSTM and is associated with other layers. The resulting output music possesses great quality and abundant elements. Using a multilayer structure, it is also capable of composing dance moves and voices. Based on this model, Hang Chu created applications Neural Karaoke, Neural Dancing and Neural Story.
Side project of interest
The man behind this project, Hang Chu, is a Ph.D. student at the University of Toronto. Hang Chu is currently working with his instructors Raquel Urtasan and Sanja Fidler in computer vision research. Before his Ph.D., Hang Chu earned his master degree at Cornell University and bachelor degree at Shanghai Jiao Tong University. His personal homepage can be found at http://chuhang.github.io/
This is Hang Chu’s first year at U of T. His past research includes machine learning, and 2D-3D transfer and modeling. Although now focusing on computer vision, he spends time on the side to develop interesting projects that utilize all of his knowledge. Thus this music generation model was born, and in two weeks reached the state demonstrated above.
Multilayer RNN helps AI produce music with greater abundance
The core technique behind Neural Karaoke is the Hierarchical RNN. In current AI research, RNN (Recurrent Neural Network) is a very important and popular method of machine learning. Rather than going deep into a single RNN (like what reinforcement learning does), Hierarchical RNN builds new RNNs based on the original one. By making the neural network model hierarchical, Hang Chu is able to task each layer of the network to compose a different component of the music. The target component each layer is responsible for producing is flexible and replaceable. In his model, the base layer RNN, for example, is responsible for generating the melody of the music. The second layer, which is based on the first layer, is built to compose chords. The third layer for generating drum beats. Altogether, the model produces colorful music as an output. The following diagram shows the model structure:
While constructing a multilayer RNN, each layer is connected to another. A double layer LSTM (long-short-term memory) is built within every layer to cover the deficiency of short term memory. On the other hand, Hang Chu made further extension to the music generating components: outside of the basic melody, chord and drum beats, he added dancing and lyric into the network as well. By combining all these component generation capabilities in one AI model, it can ideally create a similar style of music with a voice singing and a stick man dancing by only inputting an image.
Neural Karaoke, Google Magenta, and Flow Machine from Sony
The Neural Karaoke from Hang Chu is not the only AI music composer out there. Magenta System from Google and CSL lab from Sony also published AI models capable of composing music.
Google Magenta’s application of music composing AI, TensorFlow, is based on deep reinforcement learning and maximum likelihood. In this application, first, a Note-RNN (RNN that can generate notes) is created. Then, an LSTM is built to predict the next note based on the target music pattern. At the end, the note will be improved by Reinforcement Learning. A function combining music theory and reward based learning would determine the output note and send it to the next Note-RNN. The Magenta team mentioned that the combination of ML and RL is not only useful for generating music, it can also reduce the unnecessary failure and useless model in the neural network process. Compare Hierarchical RNN against the combination of RL and ML, Hang Chu mentioned that these structures are not conflicting to each other, but rather are complementary. (music link: https://www.youtube.com/watch?v=6ZLB2-_0Hxw)
There is no specific paper published by Sony’s CSL lab. But Sony did present their achievement, a software called *FlowMachine.* Cooperating with musician Benoît Carré, Sony brought us a song replicating The Beatles’ style, “Daddy’s Car”. (music link : https://www.youtube.com/watch?v=LSHZ_b05W7o)
To make original music easier and lower costing
After this project, Hang Chu will continue his focus on CV and modeling research. But he will also put some effort into further improving his music composing AI. Hang Chu mentioned some directions that might be fun to delve into, such as combining hierarchy RNN and RL, adding a music emotion component into the model, or to study the reversibility between music and images i.e. to paint a picture by inputting music.
Although this music composing AI started off as a hobby project meant for fun, Hang Chu still have some expectations. It is possible to evolve into a social networking app, letting people upload and share their own music and dancing routines. Another possibility is by using robots to reduce the money and time music composing requires, making it easier and lower costing to produce original music.
Analyst: Shaoyou Lu | Localized by Synced Global Team : Xiang Chen
Pingback: Art and War – Introduction – DigiTrax
A Song from Artificial Intelligence” is a captivating exploration of the intersection between technology and creativity. It delves into the remarkable strides that artificial intelligence has made in generating music and lyrics that are not only harmonious but also emotionally resonant. This thread highlights the transformative power of AI in the realm of artistic expression, raising questions about the future of music composition, copyright, and the role of human musicians. While AI can generate compelling melodies and lyrics, it also reminds us that human emotion and creativity remain irreplaceable elements in the world of music. As AI continues to evolve, this discussion offers a fascinating glimpse into the ongoing dialogue between technology and the arts.