Don’t be surprised if tomorrow’s YouTube Stars are not real, because advanced deep learning models may soon make virtual talking heads indistinguishable from today’s flesh-and-blood hosts.
YouTuber “TJWei” recently uploaded a “Face-off” video on YouTube. The four-minute split-screen clip sees two figures speaking and emoting simultaneously. Both look human, but one is an entity generated in real-time by an AI algorithm.
The video has attracted attention from leading figures in artificial intelligence, and was retweeted by the godfather of the Generative Adversarial Network (GAN) Ian Goodfellow. Weibo users meanwhile are abuzz with speculation on how the new technology might evolve new YouTube stars, at a time when the video-sharing website’s leading personalities like Ryan Higa or PewDiePie are making upwards of US$15 million a year.
The artificial intelligence technique behind the Face-off video is CycleGAN, a new type of GAN that can learn how to translate one image’s characteristics onto another image without using paired training data. Old-fashioned methods of image-to-image translation rely on datasets containing pairs of images that have correspondence. Such specialized datasets are however difficult to compile and expensive to obtain.
CycleGAN was first proposed in the paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, written by four University of California at Berkeley PhD students. Using CycleGAN, the team created a two-step transformation able to map characteristics from the original image to the target domain, and then back to the original image. This new type of GAN has achieved impressive results in various image-to-image translations, including horse-zebra object transfiguration, painting style transfer between Monet and Van Gogh, and seasonal transfer in landscapes.
Researchers and developers inspired by CycleGAN are experimenting with the technique in different applications like the face swapping video, whose creator TJWei open-sourced the code on his Github account. It is expected that energized by unsupervised learning models such as GANs, AI applications will be able to produce increasingly impressive video generation of this sort, and do so without a reliance on large datasets.
However, such entities are unlikely to take over YouTube any time soon. Video output still has anomalies. Conversely, looking more human risks pushing such virtual entities into the “uncanny valley” where they’re just creepy.
At this stage, PewDiePie’s job is probably safe.
Journalist: Tony Peng | Editor: Michael Sarazen