Self-driving AI model combines perception and control

Tuesday, 08 November, 2022

A research team from the Department of Computer Science Engineering at Toyohashi University of Technology has developed an AI model that can handle perception and control simultaneously for an autonomous driving vehicle. The AI model perceives the environment by completing several vision tasks while driving the vehicle, following a sequence of route points. The AI model can also drive the vehicle safely in diverse environmental conditions under various scenarios. Evaluated under point-to-point navigation tasks, the AI model has reportedly achieved the best drivability of certain models in a standard simulation environment.

Autonomous driving is a complex system consisting of several subsystems that handle multiple perception and control tasks. However, deploying multiple task-specific modules can be costly and inefficient, as numerous configurations are still needed to form an integrated modular system. Furthermore, the integration process can lead to information loss as many parameters are adjusted manually. With rapid deep learning research, this issue can be addressed by training a single AI model with end-to-end and multi-task manners. Thus, the model can provide navigational controls based on the observations provided by a set of sensors. As manual configuration is no longer needed, the model can manage the information all by itself.

The challenge that remains for an end-to-end model is how to extract useful information so that the controller can estimate the navigational controls properly. This can be solved by providing a lot of data to the perception module, to better perceive the surrounding environment. A sensor fusion technique can also be used to enhance performance as it fuses different sensors to capture various data aspects. However, a huge computation load is inevitable as a bigger model is needed to process more data. Moreover, a data pre-processing technique is necessary as varying sensors often come with different data modalities. Imbalance learning during the training process could be another issue, since the model performs both perception and control tasks simultaneously.

In order to address those challenges, the researchers propose an AI model trained with end-to-end and multi-task manners. The model is made of two main modules — perception and controller modules. The perception phase begins by processing RGB images and depth maps provided by a single RGBD camera. Then, the information extracted from the perception module, along with vehicle speed management and route point coordinates, is decided by the controller module to estimate the navigational controls. To ensure that all tasks can be performed equally, the team employs an algorithm called modified gradient normalisation (MGN) to balance the learning signal during the training process.

The researchers considered imitation learning, as it allows the model to learn from a large-scale dataset to match a near-human standard. Furthermore, the team designed the model to use a smaller number of parameters than others to reduce the computational load and accelerate the inference on a device with limited resources.

Based on the experimental result in a standard autonomous driving simulator, CARLA, it was revealed that fusing RGB images and depth maps to form a bird’s-eye-view semantic map can boost the overall performance. As the perception module has better overall understanding of the scene, the controller module can leverage useful information to estimate the navigational controls properly. The researchers said that the proposed model is preferable for deployment as it achieves better drivability with fewer parameters than other models.

The researchers are working on modifications and improvements to the model in order to tackle several issues when driving in poorly illuminated conditions, such as at night or in heavy rain. As a hypothesis, the researchers believe that adding a sensor that is unaffected by changes in brightness or illumination, such as LiDAR, will improve the model’s scene-understanding capabilities and result in better drivability. The researchers also aim to apply the proposed model to autonomous driving in the real world.

Image credit: iStock.com/Iaremenko