Clash of Clans wasn’t a video game but rather a way of life in 10th century Japan. And of course the locals did not scroll through content options on their smartphones as we do now, rather they read actual scrolls. Japanese Emakimono (絵巻物) illustrated handscrolls and Ehon (絵本) picture books were popular storytelling media during the arts-and-culture focused Heian period. Scrolling through the text and images brings dynamic characters and vivid scenes to life in a calligraphy-captioned experience that is as close to cinematic as the tech of the time enabled.
With declining numbers of art historians who can understand traditional Japanese scrolls, preserving the media and messages is a challenge. Working on the premise that facial expressions offer especially rich information not only about the scroll’s content but also about how these artworks were created, a team of researchers from the ROIS-DS Center for Open Data in the Humanities (CODH), University of Cambridge, Google Brain and MILA has introduced a dataset of faces extracted from such pre-modern Japanese artwork.
The KaoKore dataset includes 5552 RGB image files drawn from the 2018 Collection of Facial Expressions dataset of cropped face images from Japanese artworks. For use in supervised learning scenarios two sets of labels have been applied to the faces: male and female for gender, and social status classes noble, warrior, incarnation, and commoner.
The researchers have ensured the KaoKore dataset will work under different machine learning setups. They standardized image size and aspect ratio to 256 x 256 pixels. The images are formatted like those in ImageNet, enabling KaoKore to serve as an alternative dataset under existing unsupervised learning setups.
The researchers applied various generative models to KaoKore with results suggesting the dataset’s suitability for creative tasks. The SOTA GAN model Style GAN for example generated characters that aptly reflected the class variety in the dataset. Leveraging neural painting models meanwhile can create painting sequences from a single KaoKore image to offer insights on artistic technique and style by illustrating painting steps.
Understanding an Emakimono handscroll also requires reading the cursive texts that tell the stories. These are presented in the kuzushiji writing style that was used in Japan from the 8th through 19th centuries. Today however, only trained experts can read them, so the researchers used machine learning to automatically recognize and transcribe kuzushiji into modern Japanese characters.
Early work bridging machine learning and Japanese kanji characters drew on the digitalization of some 300,000 old Japanese books the National Institute of Japanese Literature (NIJL) and other institutes began in 2014. Bounding boxes were created for each character during the transcribing process for some of the books. The CODH researchers who curated the dataset suggested creating a separate dataset for bounding boxes could help machine learning techniques push automated transcription performance.
In 2018, CODH and researchers from the Royal Grammar School, National Institute of Japanese Literature, MILA, and Google Brain created the Kuzushiji-MNIST, Kuzushiji-49, and Kuzushiji-Kanji datasets. In 2019, almost the same group of CODH and MILA researchers proposed KuroNet, a new end-to-end model for kuzushiji recognition.
Machine learning tailored datasets and techniques can be expected to broaden research into and preservation of the pre-modern scrolls that are a huge part of Japanese art history.
The paper KaoKore: A Pre-Modern Japanese Art Facial Expression Dataset is on arXiv. The KaoKore and Kuzushiji-MNIST datasets are available on GitHub.
Journalist: Fangyu Cai | Editor: Michael Sarazen