AI Machine Learning & Data Science Research

Forget About Catastrophic Forgetting: Google’s Continual HyperTransformer Enables Efficient Continual Few-Shot Learning

In the new paper Continual Few-Shot Learning Using HyperTransformers, a Google Research team proposes Continual HyperTransformer, which modifies the recently published HyperTransformer few-shot learning method to sequentially update a convolutional neural network’s weights based on the information in a new task without forgetting the knowledge it learned from previous tasks.

Continual few-shot learning techniques enable AI models to learn from a continuous stream of tasks described by a small set of samples without forgetting their previously learned information. This learning paradigm is beneficial in real-world applications such as industrial robotics, where a deployed agent must learn in a dynamic environment with limited observations, and in privacy preservation, where sequential training shares only the model weights without exposing the data.

A Google Research team advances this research direction in the new paper Continual Few-Shot Learning Using HyperTransformers, proposing Continual HyperTransformer (CHT), a model that modifies the recently published HyperTransformer (HT, Zhmoginov et al., 2022) to sequentially update the weights of a convolutional neural network (CNN) based on the information in a new task without forgetting the knowledge learned from previous tasks.

The paper outlines the main advantages of the proposed CHT approach as follows:

  1. CHT is able to generate and update the weights of the CNN on the fly with no training required.
  2. Models learned with CHT do not suffer from catastrophic forgetting. We even see cases of the positive backward transfer for smaller models, where the performance on a given task actually improves for subsequently generated weights.
  3. While the CHT is trained to optimize for T tasks, the model can be stopped at any point t ≤ T during the inference with weights θt that are suited for all the tasks 0 ≤ τ ≤ t.
  4. The CHT model is designed to be independent from a specific step and operate as a recurrent system. It can be used to learn a larger number of tasks it was originally trained for.

Given a set of CNN weights generated from previously encountered tasks and a description of a new task, the proposed CHT model aims to update the weights such that they are suitable for all previous tasks as well as the new task.

The researchers extend the HT approach to enable CHT to handle a continual stream of tasks by using the generated weights from already learned tasks as input weight embeddings into the weight generator for new tasks. They also change the HT’s cross-entropy loss function to a more flexible prototypical loss (Snell et al., 2017) that employs prototypes as a learned representation of every class from all the tasks.

Their “preemptive” continual learning approach enables the CHT model to be trained on (T) tasks and run on any number of tasks (τ < T) while producing well-performing weights (θτ) for all its seen tasks up to that point — thus generating the CNN weights on the fly without any extra training.

The proposed CHT architecture is recurrent — because the HT parameters do not depend on task information but instead take the weights and support set as input, the trained model can learn to generate weights for any additional unseen tasks. The researchers also demonstrate that CHT does not suffer from the catastrophic forgetting issue that has plagued other sequential learning approaches and can even result in backward transfers that improve performance on previously learned tasks.

In their empirical study, the team compared CHT to Constant ProtoNet (ConstPN) and Merged HyperTransformer (MergedHT) baseline models in both task-incremental and class-incremental scenarios. In the evaluations, CHT consistently outperformed ConstPN and surpassed MergedHT when trained on more tasks.

This work validates the proposed CHT as an efficient few-shot learner free of the catastrophic forgetting issue and suitable for various use scenarios.

The paper Continual Few-Shot Learning Using HyperTransformers is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Forget About Catastrophic Forgetting: Google’s Continual HyperTransformer Enables Efficient Continual Few-Shot Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

%d