A team of researchers from Mila and Google Brain believe simple pencil sketches could help AI models generalize to a better understanding of unseen images.
Deep neural networks excel in practical perceptual tasks, and they keep getting stronger. As noted in Stanford’s AI Index 2019: “The time required to train a large image classification system on cloud infrastructure has fallen from about three hours in October 2017 to about 88 seconds in July 2019.”
Smart machines’ ability to generalize to variations not seen during model training however remains far from what humans are capable of. Humans quickly learn to pick up on salient qualities rather than looking at the little things. We can for example can identify and understand cartoon characters even in the absence of many visual details. Machine learning algorithms fail to generalize in such a way unless they have been explicitly trained to do so.
In a new paper, the Mila and Google Brain researchers introduce a “SketchTransfer” dataset and task wherein neural networks are evaluated on the quality of the abstractions they are able to learn without explicit supervision.
The SketchTransfer training dataset includes labeled real images from the CIFAR-10 dataset and unlabeled sketch images from the Quick Draw! dataset in categories such as frogs, birds, cats, dogs, cars, planes, etc.
SOTA models were tasked with recognizing and classifying objects in the sketches while provided only the labels for real images. The task is challenging because for example the only available clear difference between a dog sketch and a cat sketch might be the shape of the animal’s nose or ears. But instead of classifying images like humans would, a neural network might not immediately seek out this noticeable and distinctive feature.
Researchers observed that a SOTA technique which scored over 95 percent accuracy on MNIST to SVHN transfer was only able to manage 59 percent on the SketchTransfer task. While this was far better than random, it fell short of the 87 percent accuracy of a classifier trained directly on labeled sketches.
When humans look at the world we ignore most of the visual information and focus our attention on relevant details that represent invariant abstractions. Researchers conclude that teaching machines to see and understand the world in such a fundamental and humanlike way is approachable with contemporary methods, but that there is “substantial room for improvement.”
SketchTransfer offers the community a new tool for examining deep networks and their abilities to generalize abstractions.
The paper SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks is available on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen