AI Machine Learning & Data Science Research

ETH Zurich Proposes a Robotic System Capable of Self-Improving Its Semantic Perception

A research team from ETH Zurich combines continual learning and self-supervision to propose a novel robot system that enables online life-long self-supervised learning of semantic scene understanding.

Mobile intelligent robots are being deployed in increasingly unstructured environments, where they are expected to work out complex and dynamic tasks such as autonomous movement and mobile manipulation. Such learning-based robots not only need to acquire basic information about their environments, but must also build this understanding with respect to factors such as object detection and semantic classification.

Typically, a static model pretrained on a variety of data is deployed in a particular learning-based robot system. A robot expected to understand semantics, i.e. what is happening in a scene, would therefore learn how to do so during its pretraining phase. This approach poses three main challenges: the model may need to be retrained to incorporate new data; acquired knowledge should be preserved while adapting to new tasks and environments; and training signals of the environment are required during deployment.

In the recent paper Self-Improving Semantic Perception on a Construction Robot, an ETH Zurich research team proposes a new approach that combines continual learning and self-supervision in a novel robotic system to enable online life-long self-supervised learning of semantic scene understanding.


The idea of self-improving learning robotic agents has been explored in previous work under two frameworks, reinforcement learning (RL) and online parameter optimizations for model predictive control. In the case of RL, robots can learn to perform their required tasks — walk, grasp objects, fly etc. — but once these skills are acquired the learned models are fixed. Such robots thus lack any life-long learning capability. Online parameter optimisations for model predictive control frameworks enable robots to benefit from on-the-job learning but do not address another problem: forgetting.

Many previous studies on self-supervision learning have focused on learning useful image features in convolutional neural networks. A drawback is that these approaches require supervision to relate the learned features to any meaning. Other approaches aim at producing pseudolabels for image segmentation, such as the class activation maps (CAM) of image classifiers that generate sparse regional annotation for an image. The new paper refines the latter approach by using an environment’s observable characteristics to produce a learning signal for the target task while at the same time leveraging existing annotated data from related tasks as prior knowledge.

In continual learning, neural network models are trained from non-stationary data distributions over a variety of tasks and domains, with the goal of optimizing performance in each task, as well as maintaining performance when transferring knowledge from a previous task to the current task. One way to do this is to store all data from previous tasks and retrain the network from scratch for each new task. This approach however is impractical due to limited memory, as models need to be updated and deployed at the same time. To solve this, the researchers propose “replay buffers” that supplement the training data at each new environment via a memory function that maintains a limited number of samples from the previous environment.

Previous studies on applying continual learning in the context of semantic segmentation have generally assumed that both the source and target domain are known at training time and that models were not designed to be updated online. Conversely, the ETH Zurich approach assumes the deployment domain is unknown beforehand and the agent must continually update semantic knowledge on the current environment without forgetting previously seen environments.


Putting these pieces together, the proposed self-improving perception system interlinks localization within a map and semantic segmentation of the scene. The researchers create pseudolabels based on the map localization to train the semantic segmentation, and use this foreground and background segmentation to inform the localization, creating a feedback loop that yields improvements for both parts.

The team evaluated the performance of the proposed framework in different steps of increasing complexity and three different environments: a construction site, a parking garage, and an office. For self-improvement capability testing, the robot was deployed in different unknown environments and the gained improvements measured. For forgetting and knowledge transfer effects evaluation, deployment was switched between different environments. They also conducted an experiment to test the robot’s online learning capabilities.


The results of the experiments validated that the proposed system has self-improving ability in diverse environments and the proposed memory replay is an effective solution for mitigating forgetting, demonstrating that the approach can endow robotic systems with self-improving and continual and online learning capabilities.

The paper Self-Improving Semantic Perception on a Construction Robot is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “ETH Zurich Proposes a Robotic System Capable of Self-Improving Its Semantic Perception

  1. Pingback: [R] ETH Zurich Proposes a Robotic System Capable of Self-Improving Its Semantic Perception – ONEO AI

  2. Good review of the paper 📝 ,

  3. Pingback: r/artificial - [R] ETH Zurich Proposes a Robotic System Capable of Self-Improving Its Semantic Perception - Cyber Bharat

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: