Today, the “Godfather of Deep Learning” Dr. Geoffrey Hinton published the paper Dynamic Routing Between Capsules on Arxiv. This is being hailed as a revolutionary work that could totally revamp computer vision research.
Co-authored with Sara Sabour and Nicholas Frosst from Google Brain Toronto, the paper proposes a neural network based on a capsule system. A capsule is a group of neurons that learns to recognize a visual entity and outputs the probability that the entity is present within its limited domain; as well as “instantiation parameters” that include the entity’s pose, lighting and deformation.
In a multi-layer capsule system, an active capsule at one level can transmit its output of probabilities via dynamic routing to a capsule at the higher level, which will be activated if multiple outputs agree.
According to the paper, a trained multi-layer CapsNet has already achieved outstanding results on MNIST, a world-leading handwriting database, and has performed better than a convolutional network in overlapping image problems.
In the paper’s introduction Dr. Hinton explains why he believes a capsule system represents the future of computer vision: “Human vision ignores irrelevant details by using a carefully determined sequence of fixation points to ensure that only a tiny fraction of the optic array is ever processed at the highest resolution. Introspection is a poor guide to understanding how much of our knowledge of a scene comes from the sequence of fixations and how much we glean from a single fixation, but in this paper we will assume that a single fixation gives us much more than just a single identified object and its properties.”
In 2011, when Dr. Hinton first proposed capsules, it seemed he was already turning his back on backpropagation — the method he himself had pioneered for neural networks back in the 1980s.
The majority of machine learning algorithms in neural networks deal with weight, adjusting the priority or the influence of a particular node. Different data produces different effects on the final output, for example to enable machines to detect an image of cat, algorithms will enhance weight on the visual data of cats’ unique features such as eyes, ears and whiskers rather than their tails or legs.
Backpropagation is a method for optimizing machine learning algorithms by adjusting weights so the algorithms can be perfected with the fewest possible errors.
Although backpropagation has thus far achieved great success in machine learning, many researchers question whether it is the correct method for pursuing artificial general intelligence (AGI), the long-range, human-intelligence-level target of contemporary AI technology. At an AI conference this September, Dr. Hinton said he was skeptical regarding backpropagation’s suitability for unsupervised learning, which is generally-regarded as the pathway to AGI.
Because supervised learning requires overwhelming amounts of computing power and training data to produce accurate outputs, researchers have begun to explore unsupervised learning, which requires little or no labeled data. Dr. Hinton is now suggesting researchers start over using capsules as a new method.
Dr. Hinton led the rise of deep learning over the last 10 years. In 2006 he co-authored the paper A Fast Learning Algorithm for Deep Belief Nets, which first proposed the method of greedy layer-wise training for deep neural networks.
In 2012, Hinton lead a University of Toronto team in an ImageNet competition, using convolutional neural networks (CNN) for image recognition application. They achieved outstanding results, and the method soon revolutionized the field of computer vision.
Dr. Hinton concludes the paper by saying that although capsules research is still in its infancy, the potential is strong and worth exploring. “Research on capsules is now at a similar stage to research on recurrent neural networks for speech recognition at the beginning of this century. There are fundamental representational reasons for believing that it is a better approach but it probably requires a lot more small insights before it can out-perform a highly developed technology.”
Author: Tony Peng | Editor: Michael Sarazen