Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation

Currently, training perception models in self-driving rely on collecting massive amounts of real-world data. However, data collection and labeling are expensive. Therefore, sampling data from self-driving simulators may be considered.

A recent paper on arXiv.org paper seeks to find the best strategies for exploiting a driving simulator.

Driving simulator. Image credit: Toyota Motorsport GmbH via Flickr, CC BY-ND 2.0

The researchers use recent advances in domain adaptation theory, and from this perspective, propose a theoretically inspired method for synthetic data generation. The proposed sampling technique is simple to implement in practice and builds on reducing the distance between the labels’ marginals, allowing adversarial frameworks to learn meaningfully in representation space.

Also, a novel way to combine adversarial methods with pseudo-labels is shown. Experimental data validate the efficacy of the method and show the same approach can be applied to different sensors and data modalities.

Autonomous driving relies on a huge volume of real-world data to be labeled to high precision. Alternative solutions seek to exploit driving simulators that can generate large amounts of labeled data with a plethora of content variations. However, the domain gap between the synthetic and real data remains, raising the following important question: What are the best ways to utilize a self-driving simulator for perception tasks? In this work, we build on top of recent advances in domain-adaptation theory, and from this perspective, propose ways to minimize the reality gap. We primarily focus on the use of labels in the synthetic domain alone. Our approach introduces both a principled way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. Our method is easy to implement in practice as it is agnostic of the network architecture and the choice of the simulator. We showcase our approach on the bird’s-eye-view vehicle segmentation task with multi-sensor data (cameras, lidar) using an open-source simulator (CARLA), and evaluate the entire framework on a real-world dataset (nuScenes). Last but not least, we show what types of variations (e.g. weather conditions, number of assets, map design, and color diversity) matter to perception networks when trained with driving simulators, and which ones can be compensated for with our domain adaptation technique.

Research paper: Acuna, D., Philion, J., and Fidler, S., “Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation”, 2021. Link: https://arxiv.org/abs/2111.07971