Architecture and weights are two essential considerations for artificial neural networks. Architecture is akin to the innate human brain, and contains the neural network’s initial settings such as hyperparameters, layers, node connections (or wiring), etc. Weights meanwhile are the relative strength of the different connections between nodes after model training, which can be likened to a human brain that has learned for example how to multiply numbers or speak French.
As with the age-old “nature versus nurture” debate, AI researchers want to know whether architecture or weights play the main role in the performance of neural networks. In a blow to the “nurture” side, Google researchers have now demonstrated that a neural network which has not learned weights through training can still achieve satisfactory results in machine learning tasks.
Google Brain researchers Adam Gaier and David Ha said their idea was inspired by precocial behaviors that have evolved in nature, explaining in a blog post: “In biology, precocial species are those whose young already possess certain abilities from the moment of birth. There is evidence to show that lizard and snake hatchlings already possess behaviors to escape from predators. Shortly after hatching, ducks are able to swim and eat on their own, and turkeys can visually recognize predators.”
A majority of research efforts in machine learning over the past decades have involved designing appropriate neural network architectures for specific tasks — for example convolutional neural networks for computer vision and pattern recognition tasks or recurrent neural networks with long short-term memory for processing time-series data such as speech and language.
The goal of the new research is to find weight agnostic neural networks (WANN) with strong inductive biases that can perform various tasks using only random initial parameters. The process includes following steps:
- Researchers create a group of neural networks with the simplest architecture — no hidden nodes and only partially connected inputs and outputs;
- Each network is evaluated over multiple rollouts, with different shared weight values (-2, -1, -0.5, +0.5, +1, +2) assigned at each rollout;
- Networks are ranked according to their performance and complexity;
- Researchers alter the highest ranked network topologies in one of three ways:
- Insert a new node;
- Connect previously unconnected nodes;
- Reassign the activation function for a hidden node, including both the common ones (e.g. linear, sigmoid, ReLU) and more exotic (Gaussian, sinusoid, step);
- Each altered network repeats steps 2 through 5.
Researchers evaluated the WANNs on three continuous control tasks (CartPoleSwingUp, BipedalWalker-v2, and CarRacing-v0) with a random weight parameters, and results can be found in the table below. For example, the classic benchmark test for nonlinear control, CartPoleSwingUp, consists of a pole which acts as an inverted pendulum, attached to a cart. The model’s goal is to control cart movement in order to swing the pole into an upright and balanced position.
Researchers found the results were surprisingly good, as the WANN models with the best-performing shared weight values reached an upright pole position on the CartPoleSwingUp task after only after a few swings.
Researchers also applied WANNs on a supervised image classification task and discovered a network without weight training can achieve an accuracy of 82.0 percent ± 18.7 percent on the MNIST dataset. On Reddit, Ha commented “We also thought that a result of 80-90% (whether good or bad) seemed interesting enough for network initialized with random weights, especially when compared to chance accuracy.”
The paper has prompted a lively discussion on Twitter and Reddit. While some posters believe the research reflects an interesting side of neural networks, others argue that untrained neural networks are wholly impractical in actual practice.
Experiment results also proved that WANNs are no match for convolutional neural networks, which was an expected outcome.
The paper’s authors suggest their findings may help tackle a number of challenging problems in machine learning. “Effectively training models that rely on discrete components or utilize adaptive computation mechanisms with gradient-based methods remain a challenging research area. We hope this work will encourage further research that facilitates the discovery of new architectures that not only possess inductive biases for practical domains, but can also be trained with algorithms that may not require gradient computation.”
Read the paper on this interactive blog.
Journalist: Tony Peng | Editor: Michael Sarazen