Autonomous robotic systems capable of assembling new object through visuo-spatial reasoning hold great potential for a broad range of real-world applications. Despite the swift advancements in part assembly, existing approaches remain limited to pre-defined targets or familiar categories.
To tackle this limitation, in a new paper General Part Assembly Planning, a joint research team from Columbia University and Google DeepMind introduces General Part Assembly Transformer (GPAT), a transformer-based model for assembly planning that has strong generalization capability to automatically estimate a wide variety of novel target shapes and parts.
The team summarizes their main contributions as follows:
- We propose the task of general part assembly to study the ability of building novel targets with unseen parts.
- We tackle the planning problem for general part assembly as a goal-conditioned shape rearrangement problem – treating part assembly as an “open-vocabulary” (i.e., vocabulary of parts) target object segmentation task.
- We introduce General Part Assembly Transformer (GPAT) for assembly planning, which can be trained to generalize to novel and diverse target and part shapes.
The team describes the task as: given a target point cloud and part point clouds as inputs, the model is trained to predict a 6- DoF part pose for each input part to form a final part assembly. The proposed solution to tackle this task contains two steps: 1) target segmentation that utilizes General Part Assembly Transformer (GPAT) to decompose the target into disjoint segments, each segment represents fine-grained details of a transformed part; 2) pose estimation that takes the set of parts and segmentations of the target as inputs to find the final 6-DoF part poses for each part with pose estimation.
In their empirical study, the team evaluated GPAT on PartNet, a large-scale dataset of 3D objects, as well as real-world data. GPAT demonstrates competitive performance under all the generalization scenarios, achieving high success rate in estimating novel and diverse target and part shapes.
Overall, this work shows the generative capability of GPAT, the team believes their work has great potential in building vision-based general robotic assembly systems.
The paper General Part Assembly Planning on arXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.