Any good driver who is about to change lanes knows it’s important to glance over their shoulder to ensure there are no vehicles in their blind spot — and such real-time awareness of nearby vehicles is no less critical for autonomous driving systems. That’s why self-driving technologies rely on a robust perception backbone that is expected to identify all relevant agents in the environment, including accurate “pose and shape” estimation of other vehicles sharing the road.
Autonomous vehicle systems have evolved their own digital approaches to shoulder-checking, leveraging data from one of their most common sensing modalities, LiDAR. Now, a team of researchers from Pittsburgh-based autonomous vehicle technology company Argo AI, Microsoft, and CMU have introduced a novel network architecture for jointly estimating the shape and pose of vehicles even from partial LiDAR observations.
Existing SOTA methods for pose and shape prediction typically first estimate the pose of an unaligned partial point cloud then apply that pose to the partial input before estimating the shape. However, this encoder-pose decoder and encoder-shape decoder architecture can result in shape estimation suffering from any errors in the pose estimation network’s output and, eventually, poor completion performance. Moreover, in such a pipeline, the partial input is redundantly encoded twice.
Why not just use a shared-encoder network for estimating pose and shape?
Applying this strategy, the researchers merged encoding into one process in a bid to reduce redundancy and provide stable pose and shape estimation in a shared-encoder network.
The training of the shared-encoder network has two parts. The encoder and completion decoder are first trained for shape completion. The next step is to freeze the encoder and train the pose decoder using the codes produced by the frozen encoder. Freezing is a common technique used to accelerate neural network training by progressively freezing hidden layers. A pose estimator trained in this way noticeably improves the accuracy of pose estimation compared to the baseline networks.
The team’s work on a shared-encoder network led them to discover another trick: using a combined pose and shape loss. This approach improved performance over the independently-trained shape and pose modules, with the novel architecture achieving SOTA results on both realistic-synthetic data and real-world data.
Argo AI will further explore using the estimated shape model in downstream modules such as tracking, and adopting the novel architecture in real-time systems.
The paper Joint Pose and Shape Estimation of Vehicles from LiDAR Data is on arXiv.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any latest news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.